Dear all,
I am reporting what I believe to be regression due to c0a40097f0bc81deafc15f9195d1fb54595cd6d0.
After this change I am experiencing long boot times on a setup that has what seems like a bad usb. The progress of the boot gets halted while retrying (and ultimately failing) to enumerate the USB device and is only allowed to continue after giving up enumerating the USB device. On Arch Linux this manifests itself by a message from SystemD having a wait job on journald. Journald starts just after the enumeration fails with "unable to enumerate USB device". This results in longer boot times on average 1 minute longer than usual (usually around 10s). No stable kernel before this change exhibits the issue all stable kernels after this change exhibit the issue.
See the related USB messages attached below (these messages are continuous and have not been snipped) :
[...] [ 9.640854] usb 1-9: device descriptor read/64, error -110 [ 25.147505] usb 1-9: device descriptor read/64, error -110 [ 25.650779] usb 1-9: new high-speed USB device number 5 using xhci_hcd [ 30.907482] usb 1-9: device descriptor read/64, error -110 [ 46.480900] usb 1-9: device descriptor read/64, error -110 [ 46.589883] usb usb1-port9: attempt power cycle [ 46.990815] usb 1-9: new high-speed USB device number 6 using xhci_hcd [ 51.791571] usb 1-9: Device not responding to setup address. [ 56.801594] usb 1-9: Device not responding to setup address. [ 57.010803] usb 1-9: device not accepting address 6, error -71 [ 57.137485] usb 1-9: new high-speed USB device number 7 using xhci_hcd [ 61.937624] usb 1-9: Device not responding to setup address. [ 66.947485] usb 1-9: Device not responding to setup address. [ 67.154086] usb 1-9: device not accepting address 7, error -71 [ 67.156426] usb usb1-port9: unable to enumerate USB device [...]
This issue does not manifest in 44a45be57f85. I am available to test any patches to address this on my system since I understand this could be quite hard to replicate on any system. I am available to provide more information if I am able or with guidance to help troubleshoot the issue further.
Wishing you all a good day.
#regzbot introduced: c0a40097f0bc81deafc15f9195d1fb54595cd6d0
On Thu, Mar 06, 2025 at 12:32:59AM +0800, Seïfane Idouchach wrote:
Dear all,
I am reporting what I believe to be regression due to c0a40097f0bc81deafc15f9195d1fb54595cd6d0.
After this change I am experiencing long boot times on a setup that has what seems like a bad usb. The progress of the boot gets halted while retrying (and ultimately failing) to enumerate the USB device and is only allowed to continue after giving up enumerating the USB device. On Arch Linux this manifests itself by a message from SystemD having a wait job on journald. Journald starts just after the enumeration fails with "unable to enumerate USB device". This results in longer boot times on average 1 minute longer than usual (usually around 10s). No stable kernel before this change exhibits the issue all stable kernels after this change exhibit the issue.
See the related USB messages attached below (these messages are continuous and have not been snipped) :
[...] [ 9.640854] usb 1-9: device descriptor read/64, error -110 [ 25.147505] usb 1-9: device descriptor read/64, error -110 [ 25.650779] usb 1-9: new high-speed USB device number 5 using xhci_hcd [ 30.907482] usb 1-9: device descriptor read/64, error -110 [ 46.480900] usb 1-9: device descriptor read/64, error -110 [ 46.589883] usb usb1-port9: attempt power cycle [ 46.990815] usb 1-9: new high-speed USB device number 6 using xhci_hcd [ 51.791571] usb 1-9: Device not responding to setup address. [ 56.801594] usb 1-9: Device not responding to setup address. [ 57.010803] usb 1-9: device not accepting address 6, error -71 [ 57.137485] usb 1-9: new high-speed USB device number 7 using xhci_hcd [ 61.937624] usb 1-9: Device not responding to setup address. [ 66.947485] usb 1-9: Device not responding to setup address. [ 67.154086] usb 1-9: device not accepting address 7, error -71 [ 67.156426] usb usb1-port9: unable to enumerate USB device
That's a real issue, but should not be due to the commit id you referenced.
[...]
This issue does not manifest in 44a45be57f85.
What does that commit have to do with this? That's just a build break fix.
I am available to test any patches to address this on my system since I understand this could be quite hard to replicate on any system. I am available to provide more information if I am able or with guidance to help troubleshoot the issue further.
Wishing you all a good day.
#regzbot introduced: c0a40097f0bc81deafc15f9195d1fb54595cd6d0
We know there are issues here. That commit was "fixed" by commit 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race"), but then that caused a different problem, so it was reverted by commit 9a71892cbcdb ("Revert "driver core: Fix uevent_show() vs driver detach race"").
There are many discussions about this on the mailing list, with a proposal to add Dan's "fix" back. If you could try that, it would be great to see.
I think your USB problem is different here, but if you add 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race") to your kernel, that would be great to see.
thanks,
greg k-h
On Thu, Mar 6, 2025 at 2:26 AM Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Mar 06, 2025 at 12:32:59AM +0800, Seïfane Idouchach wrote:
Dear all,
I am reporting what I believe to be regression due to c0a40097f0bc81deafc15f9195d1fb54595cd6d0.
After this change I am experiencing long boot times on a setup that has what seems like a bad usb. The progress of the boot gets halted while retrying (and ultimately failing) to enumerate the USB device and is only allowed to continue after giving up enumerating the USB device. On Arch Linux this manifests itself by a message from SystemD having a wait job on journald. Journald starts just after the enumeration fails with "unable to enumerate USB device". This results in longer boot times on average 1 minute longer than usual (usually around 10s). No stable kernel before this change exhibits the issue all stable kernels after this change exhibit the issue.
See the related USB messages attached below (these messages are continuous and have not been snipped) :
[...] [ 9.640854] usb 1-9: device descriptor read/64, error -110 [ 25.147505] usb 1-9: device descriptor read/64, error -110 [ 25.650779] usb 1-9: new high-speed USB device number 5 using xhci_hcd [ 30.907482] usb 1-9: device descriptor read/64, error -110 [ 46.480900] usb 1-9: device descriptor read/64, error -110 [ 46.589883] usb usb1-port9: attempt power cycle [ 46.990815] usb 1-9: new high-speed USB device number 6 using xhci_hcd [ 51.791571] usb 1-9: Device not responding to setup address. [ 56.801594] usb 1-9: Device not responding to setup address. [ 57.010803] usb 1-9: device not accepting address 6, error -71 [ 57.137485] usb 1-9: new high-speed USB device number 7 using xhci_hcd [ 61.937624] usb 1-9: Device not responding to setup address. [ 66.947485] usb 1-9: Device not responding to setup address. [ 67.154086] usb 1-9: device not accepting address 7, error -71 [ 67.156426] usb usb1-port9: unable to enumerate USB device
That's a real issue, but should not be due to the commit id you referenced.
[...]
This issue does not manifest in 44a45be57f85.
What does that commit have to do with this? That's just a build break fix.
I am available to test any patches to address this on my system since I understand this could be quite hard to replicate on any system. I am available to provide more information if I am able or with guidance to help troubleshoot the issue further.
Wishing you all a good day.
#regzbot introduced: c0a40097f0bc81deafc15f9195d1fb54595cd6d0
We know there are issues here. That commit was "fixed" by commit 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race"), but then that caused a different problem, so it was reverted by commit 9a71892cbcdb ("Revert "driver core: Fix uevent_show() vs driver detach race"").
There are many discussions about this on the mailing list, with a proposal to add Dan's "fix" back. If you could try that, it would be great to see.
I think your USB problem is different here, but if you add 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race") to your kernel, that would be great to see.
thanks,
greg k-h
Hello Greg,
Thank you for your time.
What does that commit have to do with this? That's just a build break fix.
This commit comes right before what seems to be the bad commit. I got to the cited (maybe) bad commit after a bisection and wanted to confirm the results.
I think your USB problem is different here, but if you add 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race") to your kernel, that would be great to see.
After reapplying the patch (15fffc6a5624) at v6.13 (ffd294d346d1), it indeed does not resolve the issue. The behavior is bit different than at the reported commit (c0a40097f0bc) in the sense that it seems that the block is happening earlier in the boot before even systemd has started because there is no mention of a wait job. However the end result is still the same; the boot will only continue after the "unable to enumerate USB device" message.
staying available if you have anything else
Dear all,
I continued bisecting and while applying Dan's fix (15fffc6a5624) along the way. While the patch solves the problem for some commits it seems I am hitting another commit that exhibits the error again (25f51b76f90f10f9bf2fbc05fc51cf685da7ccad).
I tested on top of v6.14-rc5 (7eb172143d5508) which has the issue, applying the fix and reverting the bad commit (25f51b76f90f10) fixes it. Both the applying fix and the revert are needed to resolve the issue.
Let me know your thoughts on this.
On Fri, Mar 07, 2025 at 08:58:04PM +0800, Seïfane Idouchach wrote:
Dear all,
I continued bisecting and while applying Dan's fix (15fffc6a5624) along the way. While the patch solves the problem for some commits it seems I am hitting another commit that exhibits the error again (25f51b76f90f10f9bf2fbc05fc51cf685da7ccad).
That is a totally different change, I think you have something odd here as these bisection points are very confusing.
I tested on top of v6.14-rc5 (7eb172143d5508) which has the issue, applying the fix and reverting the bad commit (25f51b76f90f10) fixes it. Both the applying fix and the revert are needed to resolve the issue.
Let me know your thoughts on this.
I think you have a mix of problems here. Let's fix up all of those error messages in the log first. Dan's fix has nothing to do with that at all, once the USB bus connection stuff is resolved, then it should be ok.
As that xhci commit you point at is showing an issue, are you sure that you are properly building the right xhci driver into the system? Do you have a Renesas xhci controller? What is the output of 'lspci'?
thanks,
greg k-h
That is a totally different change, I think you have something odd here as these bisection points are very confusing.
I can only agree. I was skeptical that reverting this commit would fix the issue but it does.
I think you have a mix of problems here. Let's fix up all of those error messages in the log first. Dan's fix has nothing to do with that at all, once the USB bus connection stuff is resolved, then it should be ok.
Are you suggesting you want to fix those messages ? I am sorry if I was not clear before, those messages are always present even on a "good" build. The issue is that on a "bad" build they hold back the boot process from continuing. USB functionality is never affected.
As that xhci commit you point at is showing an issue, are you sure that you are properly building the right xhci driver into the system? Do you have a Renesas xhci controller? What is the output of 'lspci'?
I am building with a config based on my current distribution, Arch Linux, with olddefconfig. A quick grep for the values found in the commit returns the following : CONFIG_USB_XHCI_PCI=y CONFIG_USB_XHCI_PCI_RENESAS=m
lspci as requested: 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480] 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481] 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482] 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483] 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483] 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482] 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482] 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483] 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482] 00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482] 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482] 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484] 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482] 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484] 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61) 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51) 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 0 [1022:1440] 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 1 [1022:1441] 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 2 [1022:1442] 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 3 [1022:1443] 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 4 [1022:1444] 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 5 [1022:1445] 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 6 [1022:1446] 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 7 [1022:1447] 01:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. A2000 NVMe SSD [SM2263EN] [2646:2263] (rev 03) 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset USB 3.1 XHCI Controller [1022:43ee] 02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset SATA Controller [1022:43eb] 02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset Switch Upstream Port [1022:43e9] 03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea] 03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea] 04:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01) 2a:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05) 2b:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106 [GeForce RTX 2070 Rev. A] [10de:1f07] (rev a1) 2b:00.1 Audio device [0403]: NVIDIA Corporation TU106 High Definition Audio Controller [10de:10f9] (rev a1) 2b:00.2 USB controller [0c03]: NVIDIA Corporation TU106 USB 3.1 Host Controller [10de:1ada] (rev a1) 2b:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C UCSI Controller [10de:1adb] (rev a1) 2c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a] 2d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485] 2d:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486] 2d:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] 2d:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
Thanks for your time
Some development here,
I noticed today that while applying Dan's patch and reverting the "bad" commit resolves the issue, it only does so on a reboot. The boot is still slow on a cold boot. As you said this might very well be a mix of different issues. It is my own fault for not reporting this regression earlier thinking it would be fixed.
As a sanity check I retested old LTS releases. I find that v6.1 does not have the issue on cold boot while v6.6 does. The USB error messages are there regardless, they just don't impede on the boot process time. I am almost 90% positive that those error messages have always been present on this system, for what it's worth. I have gone through the troubleshooting step of unplugging all USB devices and headers and the errors are still present.
If I find the time I might run another bisect between v6.1 and v6.6 doing cold boots instead of reboots and report back. I am just afraid I will just get back to the initial commit reported since this is what I first did.
Thank you.
linux-stable-mirror@lists.linaro.org