USB3 devices connected behind several external suspended hubs may not be detected when plugged in due to aggressive hub runtime pm suspend.
The hub driver immediately runtime-suspends hubs if there are no active children or port activity.
There is a delay between the wake signal causing hub resume, and driver visible port activity on the hub downstream facing ports. Most of the LFPS handshake, resume signaling and link training done on the downstream ports is not visible to the hub driver until completed, when device then will appear fully enabled and running on the port.
This delay between wake signal and detectable port change is even more significant with chained suspended hubs where the wake signal will propagate upstream first. Suspended hubs will only start resuming downstream ports after upstream facing port resumes.
The hub driver may resume a USB3 hub, read status of all ports, not yet see any activity, and runtime suspend back the hub before any port activity is visible.
This exact case was seen when conncting USB3 devices to a suspended Thunderbolt dock.
USB3 specification defines a 100ms tU3WakeupRetryDelay, indicating USB3 devices expect to be resumed within 100ms after signaling wake. if not then device will resend the wake signal.
Give the USB3 hubs twice this time (200ms) to detect any port changes after resume, before allowing hub to runtime suspend again.
Cc: stable@vger.kernel.org Fixes: 2839f5bcfcfc ("USB: Turn on auto-suspend for USB 3.0 hubs.") Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com --- v2 changes - Update commit in Fixes tag - Add Ack from Alan Stern
drivers/usb/core/hub.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 770d1e91183c..5c12dfdef569 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -68,6 +68,12 @@ */ #define USB_SHORT_SET_ADDRESS_REQ_TIMEOUT 500 /* ms */
+/* + * Give SS hubs 200ms time after wake to train downstream links before + * assuming no port activity and allowing hub to runtime suspend back. + */ +#define USB_SS_PORT_U0_WAKE_TIME 200 /* ms */ + /* Protect struct usb_device->state and ->children members * Note: Both are also protected by ->dev.sem, except that ->state can * change to USB_STATE_NOTATTACHED even when the semaphore isn't held. */ @@ -1068,11 +1074,12 @@ int usb_remove_device(struct usb_device *udev)
enum hub_activation_type { HUB_INIT, HUB_INIT2, HUB_INIT3, /* INITs must come first */ - HUB_POST_RESET, HUB_RESUME, HUB_RESET_RESUME, + HUB_POST_RESET, HUB_RESUME, HUB_RESET_RESUME, HUB_POST_RESUME, };
static void hub_init_func2(struct work_struct *ws); static void hub_init_func3(struct work_struct *ws); +static void hub_post_resume(struct work_struct *ws);
static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) { @@ -1095,6 +1102,13 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) goto init2; goto init3; } + + if (type == HUB_POST_RESUME) { + usb_autopm_put_interface_async(to_usb_interface(hub->intfdev)); + hub_put(hub); + return; + } + hub_get(hub);
/* The superspeed hub except for root hub has to use Hub Depth @@ -1343,6 +1357,16 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) device_unlock(&hdev->dev); }
+ if (type == HUB_RESUME && hub_is_superspeed(hub->hdev)) { + /* give usb3 downstream links training time after hub resume */ + INIT_DELAYED_WORK(&hub->init_work, hub_post_resume); + queue_delayed_work(system_power_efficient_wq, &hub->init_work, + msecs_to_jiffies(USB_SS_PORT_U0_WAKE_TIME)); + usb_autopm_get_interface_no_resume( + to_usb_interface(hub->intfdev)); + return; + } + hub_put(hub); }
@@ -1361,6 +1385,13 @@ static void hub_init_func3(struct work_struct *ws) hub_activate(hub, HUB_INIT3); }
+static void hub_post_resume(struct work_struct *ws) +{ + struct usb_hub *hub = container_of(ws, struct usb_hub, init_work.work); + + hub_activate(hub, HUB_POST_RESUME); +} + enum hub_quiescing_type { HUB_DISCONNECT, HUB_PRE_RESET, HUB_SUSPEND };
On 6/11/25 1:24 PM, Mathias Nyman wrote:
USB3 devices connected behind several external suspended hubs may not be detected when plugged in due to aggressive hub runtime pm suspend.
The hub driver immediately runtime-suspends hubs if there are no active children or port activity.
There is a delay between the wake signal causing hub resume, and driver visible port activity on the hub downstream facing ports. Most of the LFPS handshake, resume signaling and link training done on the downstream ports is not visible to the hub driver until completed, when device then will appear fully enabled and running on the port.
This delay between wake signal and detectable port change is even more significant with chained suspended hubs where the wake signal will propagate upstream first. Suspended hubs will only start resuming downstream ports after upstream facing port resumes.
The hub driver may resume a USB3 hub, read status of all ports, not yet see any activity, and runtime suspend back the hub before any port activity is visible.
This exact case was seen when conncting USB3 devices to a suspended Thunderbolt dock.
USB3 specification defines a 100ms tU3WakeupRetryDelay, indicating USB3 devices expect to be resumed within 100ms after signaling wake. if not then device will resend the wake signal.
Give the USB3 hubs twice this time (200ms) to detect any port changes after resume, before allowing hub to runtime suspend again.
Cc: stable@vger.kernel.org Fixes: 2839f5bcfcfc ("USB: Turn on auto-suspend for USB 3.0 hubs.") Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com
Hi, this patch seems to cause the following splat on QC SC8280XP CRD board when resuming the system:
[root@sc8280xp-crd ~]# ./suspend_test.sh [ 37.887029] PM: suspend entry (s2idle) [ 37.903850] Filesystems sync: 0.012 seconds [ 37.915071] Freezing user space processes [ 37.920925] Freezing user space processes completed (elapsed 0.001 seconds) [ 37.928138] OOM killer disabled. [ 37.931479] Freezing remaining freezable tasks [ 37.937476] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 38.397272] Unable to handle kernel paging request at virtual address dead00000000012a [ 38.405444] Mem abort info: [ 38.408349] ESR = 0x0000000096000044 [ 38.412231] EC = 0x25: DABT (current EL), IL = 32 bits [ 38.417712] SET = 0, FnV = 0 [ 38.420873] EA = 0, S1PTW = 0 [ 38.424133] FSC = 0x04: level 0 translation fault [ 38.429168] Data abort info: [ 38.432150] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 [ 38.437804] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 38.443014] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 38.448495] [dead00000000012a] address between user and kernel address ranges [ 38.455852] Internal error: Oops: 0000000096000044 [#1] SMP [ 38.461693] Modules linked in: [ 38.464872] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-rc3-next-20250623-00003-g85d3e4a2835b #12226 NONE [ 38.475880] Hardware name: Qualcomm QRD, BIOS 6.0.230525.BOOT.MXF.1.1.c1-00114-MAKENA-1 05/25/2023 [ 38.485096] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 38.492263] pc : __run_timer_base+0x1e0/0x330 [ 38.496784] lr : __run_timer_base+0x1c4/0x330 [ 38.501291] sp : ffff800080003e80 [ 38.504718] x29: ffff800080003ee0 x28: ffff800080003e98 x27: dead000000000122 [ 38.512069] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbc2c54fcdc80 [ 38.519417] x23: 0000000000000101 x22: ffff0000871002d0 x21: 00000000ffff99c6 [ 38.526766] x20: ffffbc2c54fc1f08 x19: ffff0001fef65dc0 x18: ffff800080005028 [ 38.534113] x17: 0000000000000001 x16: ffff0001fef65e60 x15: ffff0001fef65e20 [ 38.541472] x14: 0000000000000040 x13: ffff0000871002d0 x12: ffff800080003ea0 [ 38.548819] x11: 00000000e0000cc7 x10: ffffbc2c54f647c8 x9 : ffff800080003e98 [ 38.556178] x8 : dead000000000122 x7 : 0000000000000000 x6 : ffffbc2c5133c620 [ 38.563526] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 [ 38.570884] x2 : 0000000000000079 x1 : 000000000000007b x0 : 0000000000000001 [ 38.578233] Call trace: [ 38.580771] __run_timer_base+0x1e0/0x330 (P) [ 38.585279] run_timer_softirq+0x40/0x78 [ 38.589333] handle_softirqs+0x14c/0x3dc [ 38.593404] __do_softirq+0x1c/0x2c [ 38.597025] ____do_softirq+0x18/0x28 [ 38.600825] call_on_irq_stack+0x3c/0x50 [ 38.604890] do_softirq_own_stack+0x24/0x34 [ 38.609220] __irq_exit_rcu+0xc4/0x174 [ 38.613108] irq_exit_rcu+0x18/0x40 [ 38.616718] el1_interrupt+0x40/0x5c [ 38.620423] el1h_64_irq_handler+0x20/0x30 [ 38.624662] el1h_64_irq+0x6c/0x70 [ 38.628181] arch_local_irq_enable+0x8/0xc (P) [ 38.632787] cpuidle_enter+0x40/0x5c [ 38.636484] call_cpuidle+0x24/0x48 [ 38.640104] do_idle+0x1a8/0x228 [ 38.643452] cpu_startup_entry+0x3c/0x40 [ 38.647507] kernel_init+0x0/0x138 [ 38.651026] start_kernel+0x334/0x3f0 [ 38.654828] __primary_switched+0x90/0x98 [ 38.658990] Code: 36000428 a94026c8 f9000128 b4000048 (f9000509) [ 38.665273] ---[ end trace 0000000000000000 ]--- [ 38.670045] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 38.677126] SMP: stopping secondary CPUs Waiting for ssh to finish
Konrad
On Mon, Jun 23, 2025 at 10:31:17PM +0200, Konrad Dybcio wrote:
On 6/11/25 1:24 PM, Mathias Nyman wrote:
USB3 devices connected behind several external suspended hubs may not be detected when plugged in due to aggressive hub runtime pm suspend.
The hub driver immediately runtime-suspends hubs if there are no active children or port activity.
There is a delay between the wake signal causing hub resume, and driver visible port activity on the hub downstream facing ports. Most of the LFPS handshake, resume signaling and link training done on the downstream ports is not visible to the hub driver until completed, when device then will appear fully enabled and running on the port.
This delay between wake signal and detectable port change is even more significant with chained suspended hubs where the wake signal will propagate upstream first. Suspended hubs will only start resuming downstream ports after upstream facing port resumes.
The hub driver may resume a USB3 hub, read status of all ports, not yet see any activity, and runtime suspend back the hub before any port activity is visible.
This exact case was seen when conncting USB3 devices to a suspended Thunderbolt dock.
USB3 specification defines a 100ms tU3WakeupRetryDelay, indicating USB3 devices expect to be resumed within 100ms after signaling wake. if not then device will resend the wake signal.
Give the USB3 hubs twice this time (200ms) to detect any port changes after resume, before allowing hub to runtime suspend again.
Cc: stable@vger.kernel.org Fixes: 2839f5bcfcfc ("USB: Turn on auto-suspend for USB 3.0 hubs.") Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com
Hi, this patch seems to cause the following splat on QC SC8280XP CRD board when resuming the system:
[root@sc8280xp-crd ~]# ./suspend_test.sh [ 37.887029] PM: suspend entry (s2idle) [ 37.903850] Filesystems sync: 0.012 seconds [ 37.915071] Freezing user space processes [ 37.920925] Freezing user space processes completed (elapsed 0.001 seconds)
I don't know what could be causing this problem.
However, Mathias, I did notice a minor error in the patch when I read it again. It's in the new part of hub_activate() which does this:
+ queue_delayed_work(system_power_efficient_wq, &hub->init_work, + msecs_to_jiffies(USB_SS_PORT_U0_WAKE_TIME)); + usb_autopm_get_interface_no_resume( + to_usb_interface(hub->intfdev));
Once queue_delayed_work() has been called, it's possible that the work routine will run before the usb_autopm_get_interface_no_resume() call gets executed. These two calls should be made in the opposite order.
Alan Stern
On 24.6.2025 2.32, Alan Stern wrote:
On Mon, Jun 23, 2025 at 10:31:17PM +0200, Konrad Dybcio wrote:
On 6/11/25 1:24 PM, Mathias Nyman wrote:
USB3 devices connected behind several external suspended hubs may not be detected when plugged in due to aggressive hub runtime pm suspend.
The hub driver immediately runtime-suspends hubs if there are no active children or port activity.
There is a delay between the wake signal causing hub resume, and driver visible port activity on the hub downstream facing ports. Most of the LFPS handshake, resume signaling and link training done on the downstream ports is not visible to the hub driver until completed, when device then will appear fully enabled and running on the port.
This delay between wake signal and detectable port change is even more significant with chained suspended hubs where the wake signal will propagate upstream first. Suspended hubs will only start resuming downstream ports after upstream facing port resumes.
The hub driver may resume a USB3 hub, read status of all ports, not yet see any activity, and runtime suspend back the hub before any port activity is visible.
This exact case was seen when conncting USB3 devices to a suspended Thunderbolt dock.
USB3 specification defines a 100ms tU3WakeupRetryDelay, indicating USB3 devices expect to be resumed within 100ms after signaling wake. if not then device will resend the wake signal.
Give the USB3 hubs twice this time (200ms) to detect any port changes after resume, before allowing hub to runtime suspend again.
Cc: stable@vger.kernel.org Fixes: 2839f5bcfcfc ("USB: Turn on auto-suspend for USB 3.0 hubs.") Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com
Hi, this patch seems to cause the following splat on QC SC8280XP CRD board when resuming the system:
[root@sc8280xp-crd ~]# ./suspend_test.sh [ 37.887029] PM: suspend entry (s2idle) [ 37.903850] Filesystems sync: 0.012 seconds [ 37.915071] Freezing user space processes [ 37.920925] Freezing user space processes completed (elapsed 0.001 seconds)
I don't know what could be causing this problem.
However, Mathias, I did notice a minor error in the patch when I read it again. It's in the new part of hub_activate() which does this:
queue_delayed_work(system_power_efficient_wq, &hub->init_work,
msecs_to_jiffies(USB_SS_PORT_U0_WAKE_TIME));
usb_autopm_get_interface_no_resume(
to_usb_interface(hub->intfdev));
Once queue_delayed_work() has been called, it's possible that the work routine will run before the usb_autopm_get_interface_no_resume() call gets executed. These two calls should be made in the opposite order.
Thanks, I'll fix that
-Mathias
On 23.6.2025 23.31, Konrad Dybcio wrote:
On 6/11/25 1:24 PM, Mathias Nyman wrote:
USB3 devices connected behind several external suspended hubs may not be detected when plugged in due to aggressive hub runtime pm suspend.
The hub driver immediately runtime-suspends hubs if there are no active children or port activity.
There is a delay between the wake signal causing hub resume, and driver visible port activity on the hub downstream facing ports. Most of the LFPS handshake, resume signaling and link training done on the downstream ports is not visible to the hub driver until completed, when device then will appear fully enabled and running on the port.
This delay between wake signal and detectable port change is even more significant with chained suspended hubs where the wake signal will propagate upstream first. Suspended hubs will only start resuming downstream ports after upstream facing port resumes.
The hub driver may resume a USB3 hub, read status of all ports, not yet see any activity, and runtime suspend back the hub before any port activity is visible.
This exact case was seen when conncting USB3 devices to a suspended Thunderbolt dock.
USB3 specification defines a 100ms tU3WakeupRetryDelay, indicating USB3 devices expect to be resumed within 100ms after signaling wake. if not then device will resend the wake signal.
Give the USB3 hubs twice this time (200ms) to detect any port changes after resume, before allowing hub to runtime suspend again.
Cc: stable@vger.kernel.org Fixes: 2839f5bcfcfc ("USB: Turn on auto-suspend for USB 3.0 hubs.") Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com
Hi, this patch seems to cause the following splat on QC SC8280XP CRD board when resuming the system:
[root@sc8280xp-crd ~]# ./suspend_test.sh [ 37.887029] PM: suspend entry (s2idle) [ 37.903850] Filesystems sync: 0.012 seconds [ 37.915071] Freezing user space processes [ 37.920925] Freezing user space processes completed (elapsed 0.001 seconds) [ 37.928138] OOM killer disabled. [ 37.931479] Freezing remaining freezable tasks [ 37.937476] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 38.397272] Unable to handle kernel paging request at virtual address dead00000000012a [ 38.405444] Mem abort info: [ 38.408349] ESR = 0x0000000096000044 [ 38.412231] EC = 0x25: DABT (current EL), IL = 32 bits [ 38.417712] SET = 0, FnV = 0 [ 38.420873] EA = 0, S1PTW = 0 [ 38.424133] FSC = 0x04: level 0 translation fault [ 38.429168] Data abort info: [ 38.432150] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 [ 38.437804] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 38.443014] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 38.448495] [dead00000000012a] address between user and kernel address ranges [ 38.455852] Internal error: Oops: 0000000096000044 [#1] SMP [ 38.461693] Modules linked in: [ 38.464872] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-rc3-next-20250623-00003-g85d3e4a2835b #12226 NONE [ 38.475880] Hardware name: Qualcomm QRD, BIOS 6.0.230525.BOOT.MXF.1.1.c1-00114-MAKENA-1 05/25/2023 [ 38.485096] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 38.492263] pc : __run_timer_base+0x1e0/0x330 [ 38.496784] lr : __run_timer_base+0x1c4/0x330 [ 38.501291] sp : ffff800080003e80 [ 38.504718] x29: ffff800080003ee0 x28: ffff800080003e98 x27: dead000000000122 [ 38.512069] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbc2c54fcdc80 [ 38.519417] x23: 0000000000000101 x22: ffff0000871002d0 x21: 00000000ffff99c6 [ 38.526766] x20: ffffbc2c54fc1f08 x19: ffff0001fef65dc0 x18: ffff800080005028 [ 38.534113] x17: 0000000000000001 x16: ffff0001fef65e60 x15: ffff0001fef65e20 [ 38.541472] x14: 0000000000000040 x13: ffff0000871002d0 x12: ffff800080003ea0 [ 38.548819] x11: 00000000e0000cc7 x10: ffffbc2c54f647c8 x9 : ffff800080003e98 [ 38.556178] x8 : dead000000000122 x7 : 0000000000000000 x6 : ffffbc2c5133c620 [ 38.563526] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 [ 38.570884] x2 : 0000000000000079 x1 : 000000000000007b x0 : 0000000000000001 [ 38.578233] Call trace: [ 38.580771] __run_timer_base+0x1e0/0x330 (P) [ 38.585279] run_timer_softirq+0x40/0x78 [ 38.589333] handle_softirqs+0x14c/0x3dc [ 38.593404] __do_softirq+0x1c/0x2c [ 38.597025] ____do_softirq+0x18/0x28 [ 38.600825] call_on_irq_stack+0x3c/0x50 [ 38.604890] do_softirq_own_stack+0x24/0x34 [ 38.609220] __irq_exit_rcu+0xc4/0x174 [ 38.613108] irq_exit_rcu+0x18/0x40 [ 38.616718] el1_interrupt+0x40/0x5c [ 38.620423] el1h_64_irq_handler+0x20/0x30 [ 38.624662] el1h_64_irq+0x6c/0x70 [ 38.628181] arch_local_irq_enable+0x8/0xc (P) [ 38.632787] cpuidle_enter+0x40/0x5c [ 38.636484] call_cpuidle+0x24/0x48 [ 38.640104] do_idle+0x1a8/0x228 [ 38.643452] cpu_startup_entry+0x3c/0x40 [ 38.647507] kernel_init+0x0/0x138 [ 38.651026] start_kernel+0x334/0x3f0 [ 38.654828] __primary_switched+0x90/0x98 [ 38.658990] Code: 36000428 a94026c8 f9000128 b4000048 (f9000509) [ 38.665273] ---[ end trace 0000000000000000 ]--- [ 38.670045] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 38.677126] SMP: stopping secondary CPUs Waiting for ssh to finish
Thanks for the report. Does reverting this one patch fix the issue?
What does ./suspend_test.sh look like? Could it be triggered by (system) suspending the hub while the delayed work is still pending?
Thanks Mathias
On 6/24/25 11:47 AM, Mathias Nyman wrote:
On 23.6.2025 23.31, Konrad Dybcio wrote:
On 6/11/25 1:24 PM, Mathias Nyman wrote:
USB3 devices connected behind several external suspended hubs may not be detected when plugged in due to aggressive hub runtime pm suspend.
The hub driver immediately runtime-suspends hubs if there are no active children or port activity.
There is a delay between the wake signal causing hub resume, and driver visible port activity on the hub downstream facing ports. Most of the LFPS handshake, resume signaling and link training done on the downstream ports is not visible to the hub driver until completed, when device then will appear fully enabled and running on the port.
This delay between wake signal and detectable port change is even more significant with chained suspended hubs where the wake signal will propagate upstream first. Suspended hubs will only start resuming downstream ports after upstream facing port resumes.
The hub driver may resume a USB3 hub, read status of all ports, not yet see any activity, and runtime suspend back the hub before any port activity is visible.
This exact case was seen when conncting USB3 devices to a suspended Thunderbolt dock.
USB3 specification defines a 100ms tU3WakeupRetryDelay, indicating USB3 devices expect to be resumed within 100ms after signaling wake. if not then device will resend the wake signal.
Give the USB3 hubs twice this time (200ms) to detect any port changes after resume, before allowing hub to runtime suspend again.
Cc: stable@vger.kernel.org Fixes: 2839f5bcfcfc ("USB: Turn on auto-suspend for USB 3.0 hubs.") Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com
Hi, this patch seems to cause the following splat on QC SC8280XP CRD board when resuming the system:
[root@sc8280xp-crd ~]# ./suspend_test.sh [ 37.887029] PM: suspend entry (s2idle) [ 37.903850] Filesystems sync: 0.012 seconds [ 37.915071] Freezing user space processes [ 37.920925] Freezing user space processes completed (elapsed 0.001 seconds) [ 37.928138] OOM killer disabled. [ 37.931479] Freezing remaining freezable tasks [ 37.937476] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 38.397272] Unable to handle kernel paging request at virtual address dead00000000012a [ 38.405444] Mem abort info: [ 38.408349] ESR = 0x0000000096000044 [ 38.412231] EC = 0x25: DABT (current EL), IL = 32 bits [ 38.417712] SET = 0, FnV = 0 [ 38.420873] EA = 0, S1PTW = 0 [ 38.424133] FSC = 0x04: level 0 translation fault [ 38.429168] Data abort info: [ 38.432150] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 [ 38.437804] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 38.443014] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 38.448495] [dead00000000012a] address between user and kernel address ranges [ 38.455852] Internal error: Oops: 0000000096000044 [#1] SMP [ 38.461693] Modules linked in: [ 38.464872] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-rc3-next-20250623-00003-g85d3e4a2835b #12226 NONE [ 38.475880] Hardware name: Qualcomm QRD, BIOS 6.0.230525.BOOT.MXF.1.1.c1-00114-MAKENA-1 05/25/2023 [ 38.485096] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 38.492263] pc : __run_timer_base+0x1e0/0x330 [ 38.496784] lr : __run_timer_base+0x1c4/0x330 [ 38.501291] sp : ffff800080003e80 [ 38.504718] x29: ffff800080003ee0 x28: ffff800080003e98 x27: dead000000000122 [ 38.512069] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbc2c54fcdc80 [ 38.519417] x23: 0000000000000101 x22: ffff0000871002d0 x21: 00000000ffff99c6 [ 38.526766] x20: ffffbc2c54fc1f08 x19: ffff0001fef65dc0 x18: ffff800080005028 [ 38.534113] x17: 0000000000000001 x16: ffff0001fef65e60 x15: ffff0001fef65e20 [ 38.541472] x14: 0000000000000040 x13: ffff0000871002d0 x12: ffff800080003ea0 [ 38.548819] x11: 00000000e0000cc7 x10: ffffbc2c54f647c8 x9 : ffff800080003e98 [ 38.556178] x8 : dead000000000122 x7 : 0000000000000000 x6 : ffffbc2c5133c620 [ 38.563526] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 [ 38.570884] x2 : 0000000000000079 x1 : 000000000000007b x0 : 0000000000000001 [ 38.578233] Call trace: [ 38.580771] __run_timer_base+0x1e0/0x330 (P) [ 38.585279] run_timer_softirq+0x40/0x78 [ 38.589333] handle_softirqs+0x14c/0x3dc [ 38.593404] __do_softirq+0x1c/0x2c [ 38.597025] ____do_softirq+0x18/0x28 [ 38.600825] call_on_irq_stack+0x3c/0x50 [ 38.604890] do_softirq_own_stack+0x24/0x34 [ 38.609220] __irq_exit_rcu+0xc4/0x174 [ 38.613108] irq_exit_rcu+0x18/0x40 [ 38.616718] el1_interrupt+0x40/0x5c [ 38.620423] el1h_64_irq_handler+0x20/0x30 [ 38.624662] el1h_64_irq+0x6c/0x70 [ 38.628181] arch_local_irq_enable+0x8/0xc (P) [ 38.632787] cpuidle_enter+0x40/0x5c [ 38.636484] call_cpuidle+0x24/0x48 [ 38.640104] do_idle+0x1a8/0x228 [ 38.643452] cpu_startup_entry+0x3c/0x40 [ 38.647507] kernel_init+0x0/0x138 [ 38.651026] start_kernel+0x334/0x3f0 [ 38.654828] __primary_switched+0x90/0x98 [ 38.658990] Code: 36000428 a94026c8 f9000128 b4000048 (f9000509) [ 38.665273] ---[ end trace 0000000000000000 ]--- [ 38.670045] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 38.677126] SMP: stopping secondary CPUs Waiting for ssh to finish
Thanks for the report. Does reverting this one patch fix the issue?
It seems to, but the bug is not 100% reproducible (sometimes it takes 2+ sus/res cycles to trigger). Alan's change doesn't seem to have a consistent effect.
What does ./suspend_test.sh look like?
Nothing special:
# Set up RTC wakeup echo +10 > /sys/class/rtc/rtc0/wakealarm; # Go to sleep echo mem > /sys/power/state
# Dump the AOSS sleep stats grep ^ /sys/kernel/debug/qcom_stats/*
Could it be triggered by (system) suspending the hub while the delayed work is still pending?
Maybe. What I was able to confirm is that kicking USB nodes off of DT (i.e. removing USB controllers from the system) makes the platform no longer crash.
Konrad
On 24.6.2025 19.40, Konrad Dybcio wrote:
On 6/24/25 11:47 AM, Mathias Nyman wrote:
On 23.6.2025 23.31, Konrad Dybcio wrote:
On 6/11/25 1:24 PM, Mathias Nyman wrote:
Hi, this patch seems to cause the following splat on QC SC8280XP CRD board when resuming the system:
[root@sc8280xp-crd ~]# ./suspend_test.sh [ 37.887029] PM: suspend entry (s2idle) [ 37.903850] Filesystems sync: 0.012 seconds [ 37.915071] Freezing user space processes [ 37.920925] Freezing user space processes completed (elapsed 0.001 seconds) [ 37.928138] OOM killer disabled. [ 37.931479] Freezing remaining freezable tasks [ 37.937476] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 38.397272] Unable to handle kernel paging request at virtual address dead00000000012a [ 38.405444] Mem abort info: [ 38.408349] ESR = 0x0000000096000044 [ 38.412231] EC = 0x25: DABT (current EL), IL = 32 bits [ 38.417712] SET = 0, FnV = 0 [ 38.420873] EA = 0, S1PTW = 0 [ 38.424133] FSC = 0x04: level 0 translation fault [ 38.429168] Data abort info: [ 38.432150] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 [ 38.437804] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 38.443014] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 38.448495] [dead00000000012a] address between user and kernel address ranges [ 38.455852] Internal error: Oops: 0000000096000044 [#1] SMP [ 38.461693] Modules linked in: [ 38.464872] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-rc3-next-20250623-00003-g85d3e4a2835b #12226 NONE [ 38.475880] Hardware name: Qualcomm QRD, BIOS 6.0.230525.BOOT.MXF.1.1.c1-00114-MAKENA-1 05/25/2023 [ 38.485096] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 38.492263] pc : __run_timer_base+0x1e0/0x330 [ 38.496784] lr : __run_timer_base+0x1c4/0x330 [ 38.501291] sp : ffff800080003e80 [ 38.504718] x29: ffff800080003ee0 x28: ffff800080003e98 x27: dead000000000122 [ 38.512069] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbc2c54fcdc80 [ 38.519417] x23: 0000000000000101 x22: ffff0000871002d0 x21: 00000000ffff99c6 [ 38.526766] x20: ffffbc2c54fc1f08 x19: ffff0001fef65dc0 x18: ffff800080005028 [ 38.534113] x17: 0000000000000001 x16: ffff0001fef65e60 x15: ffff0001fef65e20 [ 38.541472] x14: 0000000000000040 x13: ffff0000871002d0 x12: ffff800080003ea0 [ 38.548819] x11: 00000000e0000cc7 x10: ffffbc2c54f647c8 x9 : ffff800080003e98 [ 38.556178] x8 : dead000000000122 x7 : 0000000000000000 x6 : ffffbc2c5133c620 [ 38.563526] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 [ 38.570884] x2 : 0000000000000079 x1 : 000000000000007b x0 : 0000000000000001 [ 38.578233] Call trace: [ 38.580771] __run_timer_base+0x1e0/0x330 (P) [ 38.585279] run_timer_softirq+0x40/0x78 [ 38.589333] handle_softirqs+0x14c/0x3dc [ 38.593404] __do_softirq+0x1c/0x2c [ 38.597025] ____do_softirq+0x18/0x28 [ 38.600825] call_on_irq_stack+0x3c/0x50 [ 38.604890] do_softirq_own_stack+0x24/0x34 [ 38.609220] __irq_exit_rcu+0xc4/0x174 [ 38.613108] irq_exit_rcu+0x18/0x40 [ 38.616718] el1_interrupt+0x40/0x5c [ 38.620423] el1h_64_irq_handler+0x20/0x30 [ 38.624662] el1h_64_irq+0x6c/0x70 [ 38.628181] arch_local_irq_enable+0x8/0xc (P) [ 38.632787] cpuidle_enter+0x40/0x5c [ 38.636484] call_cpuidle+0x24/0x48 [ 38.640104] do_idle+0x1a8/0x228 [ 38.643452] cpu_startup_entry+0x3c/0x40 [ 38.647507] kernel_init+0x0/0x138 [ 38.651026] start_kernel+0x334/0x3f0 [ 38.654828] __primary_switched+0x90/0x98 [ 38.658990] Code: 36000428 a94026c8 f9000128 b4000048 (f9000509) [ 38.665273] ---[ end trace 0000000000000000 ]--- [ 38.670045] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 38.677126] SMP: stopping secondary CPUs Waiting for ssh to finish
Thanks for the report. Does reverting this one patch fix the issue?
It seems to, but the bug is not 100% reproducible (sometimes it takes 2+ sus/res cycles to trigger). Alan's change doesn't seem to have a consistent effect.
What does ./suspend_test.sh look like?
Nothing special:
# Set up RTC wakeup echo +10 > /sys/class/rtc/rtc0/wakealarm; # Go to sleep echo mem > /sys/power/state
# Dump the AOSS sleep stats grep ^ /sys/kernel/debug/qcom_stats/*
I added some memory debugging but wasn't able to trigger this.
Does this oneliner help? It's a shot in the dark.
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index d41a6c239953..1cc853c428fc 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1418,6 +1418,7 @@ static void hub_quiesce(struct usb_hub *hub, enum hub_quiescing_type type)
/* Stop hub_wq and related activity */ timer_delete_sync(&hub->irq_urb_retry); + flush_delayed_work(&hub->init_work); usb_kill_urb(hub->urb); if (hub->has_indicators) cancel_delayed_work_sync(&hub->leds);
If not, then could you add 'initcall_debug' to kernel cmd line, and usb core dynamic debug before suspend test
mount -t debugfs none /sys/kernel/debug echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
Also curious about lsusb -t output
Thanks Mathias
On 6/25/25 5:11 PM, Mathias Nyman wrote:
On 24.6.2025 19.40, Konrad Dybcio wrote:
On 6/24/25 11:47 AM, Mathias Nyman wrote:
On 23.6.2025 23.31, Konrad Dybcio wrote:
On 6/11/25 1:24 PM, Mathias Nyman wrote:
[...]
I added some memory debugging but wasn't able to trigger this.
Does this oneliner help? It's a shot in the dark.
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index d41a6c239953..1cc853c428fc 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1418,6 +1418,7 @@ static void hub_quiesce(struct usb_hub *hub, enum hub_quiescing_type type) /* Stop hub_wq and related activity */ timer_delete_sync(&hub->irq_urb_retry); + flush_delayed_work(&hub->init_work); usb_kill_urb(hub->urb); if (hub->has_indicators) cancel_delayed_work_sync(&hub->leds);
I can't seem to trigger the bug anymore with this (and Alan's change)!
If not, then could you add 'initcall_debug' to kernel cmd line, and usb core dynamic debug before suspend test
mount -t debugfs none /sys/kernel/debug echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
Also curious about lsusb -t output
Just hubs:
[root@sc8280xp-crd ~]# lsusb -t /: Bus 001.Port 001: Dev 001, Class=root_hub, Driver=xhci-hcd/1p, 480M /: Bus 002.Port 001: Dev 001, Class=root_hub, Driver=xhci-hcd/1p, 10000M /: Bus 003.Port 001: Dev 001, Class=root_hub, Driver=xhci-hcd/1p, 480M /: Bus 004.Port 001: Dev 001, Class=root_hub, Driver=xhci-hcd/1p, 10000M
Konrad
On 25.6.2025 18.41, Konrad Dybcio wrote:
On 6/25/25 5:11 PM, Mathias Nyman wrote:
On 24.6.2025 19.40, Konrad Dybcio wrote:
On 6/24/25 11:47 AM, Mathias Nyman wrote:
On 23.6.2025 23.31, Konrad Dybcio wrote:
On 6/11/25 1:24 PM, Mathias Nyman wrote:
[...]
I added some memory debugging but wasn't able to trigger this.
Does this oneliner help? It's a shot in the dark.
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index d41a6c239953..1cc853c428fc 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1418,6 +1418,7 @@ static void hub_quiesce(struct usb_hub *hub, enum hub_quiescing_type type) /* Stop hub_wq and related activity */ timer_delete_sync(&hub->irq_urb_retry); + flush_delayed_work(&hub->init_work); usb_kill_urb(hub->urb); if (hub->has_indicators) cancel_delayed_work_sync(&hub->leds);
I can't seem to trigger the bug anymore with this (and Alan's change)!
Thanks for testing
I'll send a proper patch that does these changes
-Mathias
linux-stable-mirror@lists.linaro.org