Hub driver warm-resets ports in SS.Inactive or Compliance mode to
recover a possible connected device. The port reset code correctly
detects if a connection is lost during reset, but hub driver
port_event() fails to take this into account in some cases.
port_event() ends up using stale values and assumes there is a
connected device, and will try all means to recover it, including
power-cycling the port.
Details:
This case was triggered when xHC host was suspended with DbC (Debug
Capability) enabled and connected. DbC turns one xHC port into a simple
usb debug device, allowing debugging a system with an A-to-A USB debug
cable.
xhci DbC code disables DbC when xHC is system suspended to D3, and
enables it back during resume.
We essentially end up with two hosts connected to each other during
suspend, and, for a short while during resume, until DbC is enabled back.
The suspended xHC host notices some activity on the roothub port, but
can't train the link due to being suspended, so xHC hardware sets a CAS
(Cold Attach Status) flag for this port to inform xhci host driver that
the port needs to be warm reset once xHC resumes.
CAS is xHCI specific, and not part of USB specification, so xhci driver
tells usb core that the port has a connection and link is in compliance
mode. Recovery from complinace mode is similar to CAS recovery.
xhci CAS driver support that fakes a compliance mode connection was added
in commit 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Once xHCI resumes and DbC is enabled back, all activity on the xHC
roothub host side port disappears. The hub driver will anyway think
port has a connection and link is in compliance mode, and hub driver
will try to recover it.
The port power-cycle during recovery seems to cause issues to the active
DbC connection.
Fix this by clearing connect_change flag if hub_port_reset() returns
-ENOTCONN, thus avoiding the whole unnecessary port recovery and
initialization attempt.
Cc: stable(a)vger.kernel.org
Fixes: 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Tested-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/core/hub.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 6bb6e92cb0a4..f981e365be36 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5754,6 +5754,7 @@ static void port_event(struct usb_hub *hub, int port1)
struct usb_device *hdev = hub->hdev;
u16 portstatus, portchange;
int i = 0;
+ int err;
connect_change = test_bit(port1, hub->change_bits);
clear_bit(port1, hub->event_bits);
@@ -5850,8 +5851,11 @@ static void port_event(struct usb_hub *hub, int port1)
} else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION)
|| udev->state == USB_STATE_NOTATTACHED) {
dev_dbg(&port_dev->dev, "do warm reset, port only\n");
- if (hub_port_reset(hub, port1, NULL,
- HUB_BH_RESET_TIME, true) < 0)
+ err = hub_port_reset(hub, port1, NULL,
+ HUB_BH_RESET_TIME, true);
+ if (!udev && err == -ENOTCONN)
+ connect_change = 0;
+ else if (err < 0)
hub_port_disable(hub, port1, 1);
} else {
dev_dbg(&port_dev->dev, "do warm reset, full device\n");
--
2.43.0
Hi,
in the last week, after updating to 6.6.92, we’ve encountered a number of VMs reporting temporarily hung tasks blocking the whole system for a few minutes. They unblock by themselves and have similar tracebacks.
The IO PSIs show 100% pressure for that time, but the underlying devices are still processing read and write IO (well within their capacity). I’ve eliminated the underlying storage (Ceph) as the source of problems as I couldn’t find any latency outliers or significant queuing during that time.
I’ve seen somewhat similar reports on 6.6.88 and 6.6.77, but those might have been different outliers.
I’m attaching 3 logs - my intuition and the data so far leads me to consider this might be a kernel bug. I haven’t found a way to reproduce this, yet.
Regards,
Christian
--
Christian Theune · ct(a)flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
On Fri, Apr 11, 2025 at 10:14:44AM -0500, Mario Limonciello wrote:
> From: Mario Limonciello <mario.limonciello(a)amd.com>
>
> commit a5cfc9d65879c ("thunderbolt: Add wake on connect/disconnect
> on USB4 ports") introduced a sysfs file to control wake up policy
> for a given USB4 port that defaulted to disabled.
>
> However when testing commit 4bfeea6ec1c02 ("thunderbolt: Use wake
> on connect and disconnect over suspend") I found that it was working
> even without making changes to the power/wakeup file (which defaults
> to disabled). This is because of a logic error doing a bitwise or
> of the wake-on-connect flag with device_may_wakeup() which should
> have been a logical AND.
>
> Adjust the logic so that policy is only applied when wakeup is
> actually enabled.
>
> Fixes: a5cfc9d65879c ("thunderbolt: Add wake on connect/disconnect on USB4 ports")
> Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Hi! There have been a couple of reports of a Thunderbolt regression in
recent stable kernels, and one reporter has now bisected it to this
change:
• https://bugzilla.kernel.org/show_bug.cgi?id=220284
• https://github.com/NixOS/nixpkgs/issues/420730
Both reporters are CCed, and say it starts working after the module is
reloaded.
Link: https://lore.kernel.org/r/bug-220284-208809@https.bugzilla.kernel.org%2F/
(for regzbot)
> ---
> drivers/thunderbolt/usb4.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thunderbolt/usb4.c b/drivers/thunderbolt/usb4.c
> index e51d01671d8e7..3e96f1afd4268 100644
> --- a/drivers/thunderbolt/usb4.c
> +++ b/drivers/thunderbolt/usb4.c
> @@ -440,10 +440,10 @@ int usb4_switch_set_wake(struct tb_switch *sw, unsigned int flags)
> bool configured = val & PORT_CS_19_PC;
> usb4 = port->usb4;
>
> - if (((flags & TB_WAKE_ON_CONNECT) |
> + if (((flags & TB_WAKE_ON_CONNECT) &&
> device_may_wakeup(&usb4->dev)) && !configured)
> val |= PORT_CS_19_WOC;
> - if (((flags & TB_WAKE_ON_DISCONNECT) |
> + if (((flags & TB_WAKE_ON_DISCONNECT) &&
> device_may_wakeup(&usb4->dev)) && configured)
> val |= PORT_CS_19_WOD;
> if ((flags & TB_WAKE_ON_USB4) && configured)
> --
> 2.43.0
>
Bit 7 of the 'Device Type 2' (0Bh) register is reserved in the FSA9480
device, but is used by the FSA880 and TSU6111 devices.
From FSA9480 datasheet, Table 18. Device Type 2:
Reset Value: x0000000
===========================================================================
Bit # | Name | Size (Bits) | Description
---------------------------------------------------------------------------
7 | Reserved | 1 | NA
From FSA880 datasheet, Table 13. Device Type 2:
Reset Value: 0xxx0000
===========================================================================
Bit # | Name | Size (Bits) | Description
---------------------------------------------------------------------------
7 | Unknown | 1 | 1: Any accessory detected as unknown
| Accessory | | or an accessory that cannot be
| | | detected as being valid even
| | | though ID_CON is not floating
| | | 0: Unknown accessory not detected
From TSU6111 datasheet, Device Type 2:
Reset Value:x0000000
===========================================================================
Bit # | Name | Size (Bits) | Description
---------------------------------------------------------------------------
7 | Audio Type 3 | 1 | Audio device type 3
So the value obtained from the FSA9480_REG_DEV_T2 register in the
fsa9480_detect_dev() function may have the 7th bit set.
In this case, the 'dev' parameter in the fsa9480_handle_change() function
will be 15. And this will cause the 'cable_types' array to overflow when
accessed at this index.
Extend the 'cable_types' array with a new value 'DEV_RESERVED' as
specified in the FSA9480 datasheet. Do not use it as it serves for
various purposes in the listed devices.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: bad5b5e707a5 ("extcon: Add fsa9480 extcon driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vladimir Moskovkin <Vladimir.Moskovkin(a)kaspersky.com>
---
drivers/extcon/extcon-fsa9480.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/extcon/extcon-fsa9480.c b/drivers/extcon/extcon-fsa9480.c
index b11b43171063..30972a7214f7 100644
--- a/drivers/extcon/extcon-fsa9480.c
+++ b/drivers/extcon/extcon-fsa9480.c
@@ -68,6 +68,7 @@
#define DEV_T1_CHARGER_MASK (DEV_DEDICATED_CHG | DEV_USB_CHG)
/* Device Type 2 */
+#define DEV_RESERVED 15
#define DEV_AV 14
#define DEV_TTY 13
#define DEV_PPD 12
@@ -133,6 +134,7 @@ static const u64 cable_types[] = {
[DEV_USB] = BIT_ULL(EXTCON_USB) | BIT_ULL(EXTCON_CHG_USB_SDP),
[DEV_AUDIO_2] = BIT_ULL(EXTCON_JACK_LINE_OUT),
[DEV_AUDIO_1] = BIT_ULL(EXTCON_JACK_LINE_OUT),
+ [DEV_RESERVED] = 0,
[DEV_AV] = BIT_ULL(EXTCON_JACK_LINE_OUT)
| BIT_ULL(EXTCON_JACK_VIDEO_OUT),
[DEV_TTY] = BIT_ULL(EXTCON_JIG),
@@ -228,7 +230,7 @@ static void fsa9480_detect_dev(struct fsa9480_usbsw *usbsw)
dev_err(usbsw->dev, "%s: failed to read registers", __func__);
return;
}
- val = val2 << 8 | val1;
+ val = val2 << 8 | (val1 & 0xFF);
dev_info(usbsw->dev, "dev1: 0x%x, dev2: 0x%x\n", val1, val2);
--
2.25.1
In the max3420_set_clear_feature() function, the endpoint index `id` can have a value from 0 to 15.
However, the udc->ep array is initialized with a maximum of 4 endpoints in max3420_eps_init().
If host sends a request with a wIndex greater than 3, the access to `udc->ep[id]` will go out-of-bounds,
leading to memory corruption or a potential kernel crash.
This bug was found by code inspection and has not been tested on hardware.
Fixes: 48ba02b2e2b1a ("usb: gadget: add udc driver for max3420")
Cc: stable(a)vger.kernel.org
Signed-off-by: Seungjin Bae <eeodqql09(a)gmail.com>
---
drivers/usb/gadget/udc/max3420_udc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/gadget/udc/max3420_udc.c b/drivers/usb/gadget/udc/max3420_udc.c
index 7349ea774adf..e4ecc7f7f3be 100644
--- a/drivers/usb/gadget/udc/max3420_udc.c
+++ b/drivers/usb/gadget/udc/max3420_udc.c
@@ -596,6 +596,8 @@ static void max3420_set_clear_feature(struct max3420_udc *udc)
break;
id = udc->setup.wIndex & USB_ENDPOINT_NUMBER_MASK;
+ if (id >= MAX3420_MAX_EPS)
+ break;
ep = &udc->ep[id];
spin_lock_irqsave(&ep->lock, flags);
--
2.43.0
Hello,
The following is the original thread, where a bug was reported to the
linux-wireless and ath10k mailing lists. The specific bug has been
detailed clearly here.
https://lore.kernel.org/linux-wireless/690B1DB2-C9DC-4FAD-8063-4CED659B1701…
There is also a Bugzilla report by me, which was opened later:
https://bugzilla.kernel.org/show_bug.cgi?id=220264
As stated, it is highly encouraged to check out all the logs,
especially the line of IRQ #16 in /proc/interrupts.
Here is where all the logs are:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180
(these logs are taken from an Arch liveboot)
On my daily driver, I found these on my IRQ #16:
16: 173210 0 0 0 IR-IO-APIC
16-fasteoi i2c_designware.0, idma64.0, i801_smbus
The fixes stated on the Reddit post for this Wi-Fi card didn't quite
work. (But git-cloning the firmware files did give me some more time
to have stable internet)
This time, I had to go for the GRUB kernel parameters.
Right now, I'm using "irqpoll" to curb the errors caused.
"intel_iommu=off" did not work, and the Wi-Fi was constantly crashing
even then. Did not try out "pci=noaer" this time.
If it's of any concern, there is a very weird error in Chromium-based
browsers which has only happened after I started using irqpoll. When I
Google something, the background of the individual result boxes shows
as pure black, while the surrounding space is the usual
greyish-blackish, like we see in Dark Mode. Here is a picture of the
exact thing I'm experiencing: https://files.catbox.moe/mjew6g.png
If you notice anything in my logs/bug reports, please let me know.
(Because it seems like Wi-Fi errors are just a red herring, there are
some ACPI or PCIe-related errors in the computers of this model - just
a naive speculation, though.)
Thanking you,
Bandhan Pramanik