On 24.6.2025 19.40, Konrad Dybcio wrote:
On 6/24/25 11:47 AM, Mathias Nyman wrote:
On 23.6.2025 23.31, Konrad Dybcio wrote:
On 6/11/25 1:24 PM, Mathias Nyman wrote:
Hi, this patch seems to cause the following splat on QC SC8280XP CRD board when resuming the system:
[root@sc8280xp-crd ~]# ./suspend_test.sh [ 37.887029] PM: suspend entry (s2idle) [ 37.903850] Filesystems sync: 0.012 seconds [ 37.915071] Freezing user space processes [ 37.920925] Freezing user space processes completed (elapsed 0.001 seconds) [ 37.928138] OOM killer disabled. [ 37.931479] Freezing remaining freezable tasks [ 37.937476] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 38.397272] Unable to handle kernel paging request at virtual address dead00000000012a [ 38.405444] Mem abort info: [ 38.408349] ESR = 0x0000000096000044 [ 38.412231] EC = 0x25: DABT (current EL), IL = 32 bits [ 38.417712] SET = 0, FnV = 0 [ 38.420873] EA = 0, S1PTW = 0 [ 38.424133] FSC = 0x04: level 0 translation fault [ 38.429168] Data abort info: [ 38.432150] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 [ 38.437804] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 38.443014] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 38.448495] [dead00000000012a] address between user and kernel address ranges [ 38.455852] Internal error: Oops: 0000000096000044 [#1] SMP [ 38.461693] Modules linked in: [ 38.464872] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-rc3-next-20250623-00003-g85d3e4a2835b #12226 NONE [ 38.475880] Hardware name: Qualcomm QRD, BIOS 6.0.230525.BOOT.MXF.1.1.c1-00114-MAKENA-1 05/25/2023 [ 38.485096] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 38.492263] pc : __run_timer_base+0x1e0/0x330 [ 38.496784] lr : __run_timer_base+0x1c4/0x330 [ 38.501291] sp : ffff800080003e80 [ 38.504718] x29: ffff800080003ee0 x28: ffff800080003e98 x27: dead000000000122 [ 38.512069] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbc2c54fcdc80 [ 38.519417] x23: 0000000000000101 x22: ffff0000871002d0 x21: 00000000ffff99c6 [ 38.526766] x20: ffffbc2c54fc1f08 x19: ffff0001fef65dc0 x18: ffff800080005028 [ 38.534113] x17: 0000000000000001 x16: ffff0001fef65e60 x15: ffff0001fef65e20 [ 38.541472] x14: 0000000000000040 x13: ffff0000871002d0 x12: ffff800080003ea0 [ 38.548819] x11: 00000000e0000cc7 x10: ffffbc2c54f647c8 x9 : ffff800080003e98 [ 38.556178] x8 : dead000000000122 x7 : 0000000000000000 x6 : ffffbc2c5133c620 [ 38.563526] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 [ 38.570884] x2 : 0000000000000079 x1 : 000000000000007b x0 : 0000000000000001 [ 38.578233] Call trace: [ 38.580771] __run_timer_base+0x1e0/0x330 (P) [ 38.585279] run_timer_softirq+0x40/0x78 [ 38.589333] handle_softirqs+0x14c/0x3dc [ 38.593404] __do_softirq+0x1c/0x2c [ 38.597025] ____do_softirq+0x18/0x28 [ 38.600825] call_on_irq_stack+0x3c/0x50 [ 38.604890] do_softirq_own_stack+0x24/0x34 [ 38.609220] __irq_exit_rcu+0xc4/0x174 [ 38.613108] irq_exit_rcu+0x18/0x40 [ 38.616718] el1_interrupt+0x40/0x5c [ 38.620423] el1h_64_irq_handler+0x20/0x30 [ 38.624662] el1h_64_irq+0x6c/0x70 [ 38.628181] arch_local_irq_enable+0x8/0xc (P) [ 38.632787] cpuidle_enter+0x40/0x5c [ 38.636484] call_cpuidle+0x24/0x48 [ 38.640104] do_idle+0x1a8/0x228 [ 38.643452] cpu_startup_entry+0x3c/0x40 [ 38.647507] kernel_init+0x0/0x138 [ 38.651026] start_kernel+0x334/0x3f0 [ 38.654828] __primary_switched+0x90/0x98 [ 38.658990] Code: 36000428 a94026c8 f9000128 b4000048 (f9000509) [ 38.665273] ---[ end trace 0000000000000000 ]--- [ 38.670045] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 38.677126] SMP: stopping secondary CPUs Waiting for ssh to finish
Thanks for the report. Does reverting this one patch fix the issue?
It seems to, but the bug is not 100% reproducible (sometimes it takes 2+ sus/res cycles to trigger). Alan's change doesn't seem to have a consistent effect.
What does ./suspend_test.sh look like?
Nothing special:
# Set up RTC wakeup echo +10 > /sys/class/rtc/rtc0/wakealarm; # Go to sleep echo mem > /sys/power/state
# Dump the AOSS sleep stats grep ^ /sys/kernel/debug/qcom_stats/*
I added some memory debugging but wasn't able to trigger this.
Does this oneliner help? It's a shot in the dark.
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index d41a6c239953..1cc853c428fc 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1418,6 +1418,7 @@ static void hub_quiesce(struct usb_hub *hub, enum hub_quiescing_type type)
/* Stop hub_wq and related activity */ timer_delete_sync(&hub->irq_urb_retry); + flush_delayed_work(&hub->init_work); usb_kill_urb(hub->urb); if (hub->has_indicators) cancel_delayed_work_sync(&hub->leds);
If not, then could you add 'initcall_debug' to kernel cmd line, and usb core dynamic debug before suspend test
mount -t debugfs none /sys/kernel/debug echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
Also curious about lsusb -t output
Thanks Mathias