From: Li Jun jun.li@nxp.com
While remove host(e.g. for USB role switch from host to device), if runtime pm is enabled by user, below oops are occurs at dwc3 and cdns3 platform. Keep the xhci-plat device being active during remove host fixes them.
oops1: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000240 Mem abort info: ESR = 0x96000004 xhci-hcd xhci-hcd.1.auto: // Halt the HC xhci-hcd xhci-hcd.1.auto: // Reset the HC xhci-hcd xhci-hcd.1.auto: Wait for controller to be ready for doorbell rings EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 xhci-hcd xhci-hcd.1.auto: // Disabling event ring interrupts CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000001b7b18000 xhci-hcd xhci-hcd.1.auto: cleaning up memory xhci-hcd xhci-hcd.1.auto: Freed event ring xhci-hcd xhci-hcd.1.auto: Freed command ring [0000000000000240] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.4.3-00107-g64d454a-dirty #1219 Hardware name: FSL i.MX8MP EVK (DT) Workqueue: pm pm_runtime_work pstate: 60000005 (nZCv daif -PAN -UAO) pc : xhci_suspend+0x34/0x698 lr : xhci_plat_runtime_suspend+0x2c/0x38 sp : ffff800011ddbbc0 x29: ffff800011ddbbc0 x28: 0000000000000000 x27: 0000000000000008 x26: ffff800011b28000 x25: ffff80001012b328 x24: ffff800011ddbd48 x23: 0000000000000000 x22: ffff80001076ed78 x21: ffff000177896000 x20: ffff0001714ebce4 x19: ffff000177896000 x18: ffffffffffffffff x17: 0000000000000000 x16: 0000000000000000 x15: ffff800011b288c8 x14: 0000000000000261 x13: 0000000000000001 x12: 0000000000000001 x11: 0000000000000000 x10: 00000000000009c0 x9 : ffff800011ddbd40 x8 : fefefefefefefeff x7 : 0000000000000000 x6 : 000000003ca92688 x5 : 00ffffffffffffff x4 : 001b6b0b00000000 x3 : ffff0001714ebc10 x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffff000177896250 Call trace: xhci_suspend+0x34/0x698 xhci_plat_runtime_suspend+0x2c/0x38 pm_generic_runtime_suspend+0x28/0x40 __rpm_callback+0xd8/0x138 rpm_callback+0x24/0x98 rpm_suspend+0xe0/0x448 rpm_idle+0x124/0x140 pm_runtime_work+0xa0/0xf8 process_one_work+0x1dc/0x370 worker_thread+0x48/0x468 kthread+0xf0/0x120 ret_from_fork+0x10/0x1c
oops2: usb 2-1: USB disconnect, device number 2 xhci-hcd xhci-hcd.1.auto: remove, state 4 usb usb2: USB disconnect, device number 1 xhci-hcd xhci-hcd.1.auto: USB bus 2 deregistered xhci-hcd xhci-hcd.1.auto: remove, state 4 usb usb1: USB disconnect, device number 1 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000138 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000008b05e8000 [0000000000000138] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.6.0-rc4-next-20200304-03578-gd6235ff42e2b #101 Hardware name: Freescale i.MX8QXP MEK (DT) Workqueue: 1-0050 tcpm_state_machine_work pstate: 20000005 (nzCv daif -PAN -UAO) pc : xhci_free_dev+0x214/0x270 lr : xhci_plat_runtime_resume+0x78/0x88 sp : ffff80001006b5b0 x29: ffff80001006b5b0 x28: 0000000000000002 x27: ffff00083b74fd48 x26: ffff00083a365580 x25: ffff800010141458 x24: 0000000000000000 x23: 0000000000000000 x22: ffff000837e58138 x21: 0000000000000004 x20: ffff000837e58000 x19: ffff000837e58250 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000004 x14: ffffffffffffffff x13: 0000000000004ed8 x12: ffff8000123d1000 x11: ffff800012158000 x10: ffff8000123d1360 x9 : ffff800010c74b00 x8 : 0000000000000007 x7 : 00000000000012c2 x6 : ffff8000123d1000 x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000138 x1 : 0000000000000000 x0 : 0000000000000138 Call trace: xhci_free_dev+0x214/0x270 xhci_plat_runtime_resume+0x78/0x88 pm_generic_runtime_resume+0x30/0x48 __rpm_callback+0x90/0x148 rpm_callback+0x28/0x88 rpm_resume+0x568/0x758 rpm_resume+0x260/0x758 rpm_resume+0x260/0x758 __pm_runtime_resume+0x40/0x88 device_release_driver_internal+0xa0/0x1c8 device_release_driver+0x1c/0x28 bus_remove_device+0xd4/0x158 device_del+0x15c/0x3a0 usb_disable_device+0xb0/0x268 usb_disconnect+0xcc/0x300 usb_remove_hcd+0xf4/0x1dc xhci_plat_remove+0x78/0xe0 platform_drv_remove+0x30/0x50 device_release_driver_internal+0xfc/0x1c8 device_release_driver+0x1c/0x28 bus_remove_device+0xd4/0x158 device_del+0x15c/0x3a0 platform_device_del.part.0+0x20/0x90 platform_device_unregister+0x28/0x40 cdns3_host_exit+0x20/0x40 cdns3_role_stop+0x60/0x90 cdns3_role_set+0x64/0xd8 usb_role_switch_set_role.part.0+0x3c/0x68 usb_role_switch_set_role+0x20/0x30 tcpm_mux_set+0x60/0xf8 tcpm_reset_port+0xa4/0xf0 tcpm_detach.part.0+0x28/0x50 tcpm_state_machine_work+0x12ac/0x2360 process_one_work+0x1c8/0x470 worker_thread+0x50/0x428 kthread+0xfc/0x128 ret_from_fork+0x10/0x18 Code: c8037c02 35ffffa3 17ffe7c3 f9800011 (c85f7c01) ---[ end trace 45b1a173d2679e44 ]---
Cc: Baolin Wang baolin.wang@linaro.org Cc: Mathias Nyman mathias.nyman@linux.intel.com Cc: stable@vger.kernel.org Fixes: b0c69b4bace3 ("usb: host: plat: Enable xHCI plat runtime PM") Reviewed-by: Peter Chen peter.chen@nxp.com Signed-off-by: Li Jun jun.li@nxp.com --- drivers/usb/host/xhci-plat.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 1d4f6f85f0fe..f38d53528c96 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -362,6 +362,7 @@ static int xhci_plat_remove(struct platform_device *dev) struct clk *reg_clk = xhci->reg_clk; struct usb_hcd *shared_hcd = xhci->shared_hcd;
+ pm_runtime_get_sync(&dev->dev); xhci->xhc_state |= XHCI_STATE_REMOVING;
usb_remove_hcd(shared_hcd);
Hi,
On 5/12/2020 8:05 AM, Peter Chen wrote:
Cc: Baolin Wang baolin.wang@linaro.org Cc: Mathias Nyman mathias.nyman@linux.intel.com Cc: stable@vger.kernel.org Fixes: b0c69b4bace3 ("usb: host: plat: Enable xHCI plat runtime PM") Reviewed-by: Peter Chen peter.chen@nxp.com Signed-off-by: Li Jun jun.li@nxp.com
drivers/usb/host/xhci-plat.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 1d4f6f85f0fe..f38d53528c96 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -362,6 +362,7 @@ static int xhci_plat_remove(struct platform_device *dev) struct clk *reg_clk = xhci->reg_clk; struct usb_hcd *shared_hcd = xhci->shared_hcd;
- pm_runtime_get_sync(&dev->dev);
Where is corresponding _put() call?
xhci->xhc_state |= XHCI_STATE_REMOVING; usb_remove_hcd(shared_hcd);
On 5/12/2020 8:05 AM, Peter Chen wrote:
Cc: Baolin Wang baolin.wang@linaro.org Cc: Mathias Nyman mathias.nyman@linux.intel.com Cc: stable@vger.kernel.org Fixes: b0c69b4bace3 ("usb: host: plat: Enable xHCI plat runtime PM") Reviewed-by: Peter Chen peter.chen@nxp.com Signed-off-by: Li Jun jun.li@nxp.com
drivers/usb/host/xhci-plat.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 1d4f6f85f0fe..f38d53528c96 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -362,6 +362,7 @@ static int xhci_plat_remove(struct platform_device *dev) struct clk *reg_clk = xhci->reg_clk; struct usb_hcd *shared_hcd = xhci->shared_hcd;
- pm_runtime_get_sync(&dev->dev);
Where is corresponding _put() call?
At the end of this function, there is a pm_runtime_set_suspended(&dev->dev). Calling pm_runtime_put_sync(&dev->dev) will cause xhci_plat_runtime_suspend is called which is not expected.
Peter
On 12.5.2020 7.03, Peter Chen wrote:
On 5/12/2020 8:05 AM, Peter Chen wrote:
Cc: Baolin Wang baolin.wang@linaro.org Cc: Mathias Nyman mathias.nyman@linux.intel.com Cc: stable@vger.kernel.org Fixes: b0c69b4bace3 ("usb: host: plat: Enable xHCI plat runtime PM") Reviewed-by: Peter Chen peter.chen@nxp.com Signed-off-by: Li Jun jun.li@nxp.com --- drivers/usb/host/xhci-plat.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 1d4f6f85f0fe..f38d53528c96 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -362,6 +362,7 @@ static int xhci_plat_remove(struct platform_device *dev) struct clk *reg_clk = xhci->reg_clk; struct usb_hcd *shared_hcd = xhci->shared_hcd;
- pm_runtime_get_sync(&dev->dev);
Where is corresponding _put() call?
At the end of this function, there is a pm_runtime_set_suspended(&dev->dev). Calling pm_runtime_put_sync(&dev->dev) will cause xhci_plat_runtime_suspend is called which is not expected.
Peter
This seems odd, I don't understand why pm_runtime_set_suspended() would call pm_runtime_put_sync()? I thought pm_runtime_set_suspended() is used to let pm core know the hardware is suspended, and inform the parent of this, possibly allowing parent to runtime suspend. Not sure if it's needed in the remove function. It makes sense to allow the parent to suspend, but xhci isn't really suspended. Yes is stopped, but we can't resume our way back to a active state.
Could it be that the runtime suspend call you are seeing is because we are removing all child devices, disconnecting and freeing everything, including roothubs and hcd's. Maybe runtime pm should be disabled a bit earlier.
Currently probe and remove looks like:
xhci_plat_probe() pm_runtime_set_active() pm_runtime_enable() pm_runtime_get_noresume() ... pm_runtime_put_noidle() pm_runtime_forbid()
xhci_plat_remove() <remove and put both hcd's> pm_runtime_set_suspended() pm_runtime_disable()
Would it make sense to change xhci_plat_remove() to
xhci_plat_remove() pm_runtime_disable() <remove and put both hcd's> pm_runtime_set_suspended()
or possibly wrapping the remove in a runtime get/put:
xhci_plat_remove() pm_runtime_get_noresume() pm_runtime_disable() <remove and put both hcd's> pm_runtime_set_suspended() pm_runtime_put_noidle()
-Mathias
On 5/12/2020 8:05 AM, Peter Chen wrote:
Cc: Baolin Wang baolin.wang@linaro.org Cc: Mathias Nyman mathias.nyman@linux.intel.com Cc: stable@vger.kernel.org Fixes: b0c69b4bace3 ("usb: host: plat: Enable xHCI plat runtime PM") Reviewed-by: Peter Chen peter.chen@nxp.com Signed-off-by: Li Jun jun.li@nxp.com --- drivers/usb/host/xhci-plat.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 1d4f6f85f0fe..f38d53528c96 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -362,6 +362,7 @@ static int xhci_plat_remove(struct platform_device *dev) struct clk *reg_clk = xhci->reg_clk; struct usb_hcd *shared_hcd = xhci->shared_hcd;
- pm_runtime_get_sync(&dev->dev);
Where is corresponding _put() call?
At the end of this function, there is a pm_runtime_set_suspended(&dev->dev). Calling pm_runtime_put_sync(&dev->dev) will cause xhci_plat_runtime_suspend is called which is not expected.
Peter
This seems odd, I don't understand why pm_runtime_set_suspended() would call pm_runtime_put_sync()? I thought pm_runtime_set_suspended() is used to let pm core know the hardware is suspended, and inform the parent of this, possibly allowing parent to runtime suspend.
Sorry, I may let you misunderstand. I mean call pm_runtime_set_suspended will NOT call xhci_plat_runtime_suspend , and only let its parents be runtime suspend. And if call pm_runtime_put_sync instead of pm_runtime_set_suspended , the xhci_plat_runtime_suspend is called, although it doesn't cause any issues, but hcd has been removed at the time, it is should not call xhci_suspend after that.
Not sure if it's needed in the remove function. It makes sense to allow the parent to suspend, but xhci isn't really suspended. Yes is stopped, but we can't resume our way back to a active state.
Could it be that the runtime suspend call you are seeing is because we are removing all child devices, disconnecting and freeing everything, including roothubs and hcd's.
This issue occurs plug out Type-C-to-A cable with USB3 device connected from Type-C port. From what I see it most probably dues to disconnect event triggers roothub auto-suspend, and this auto-suspend is async, if it is scheduled later than hcd is removed, this oops occurs.
Below are several logs I capture:
1. whole debug message when this issue occurs:
[ 93.889693] xhci-hcd xhci-hcd.1.auto: remove, state 1 [ 93.889911] xhci-hcd xhci-hcd.1.auto: Port change event, 2-1, id 2, portsc: 0x4202c0 [ 93.894774] xhci-hcd xhci-hcd.1.auto: handle_port_status: starting port polling. [ 93.894830] xhci-hcd xhci-hcd.1.auto: roothub graceful disconnect [ 93.894853] usb usb2: USB disconnect, device number 1 [ 93.896876] hub 2-0:1.0: state 0 ports 1 chg 0000 evt 0002 [ 93.899940] usb 2-1: USB disconnect, device number 2 [ 93.904985] usb 2-1: unregistering device [ 93.904996] usb 2-1: unregistering interface 2-1:1.0 [ 93.954023] usb 2-1: usb_disable_device nuking all URBs [ 93.954050] xhci-hcd xhci-hcd.1.auto: xhci_drop_endpoint called for udev 00000000e6468cd7 [ 93.954138] xhci-hcd xhci-hcd.1.auto: drop ep 0x81, slot id 1, new drop flags = 0x8, new add flags = 0x0 [ 93.954146] xhci-hcd xhci-hcd.1.auto: xhci_drop_endpoint called for udev 00000000e6468cd7 [ 93.954197] xhci-hcd xhci-hcd.1.auto: drop ep 0x2, slot id 1, new drop flags = 0x18, new add flags = 0x0 [ 93.954652] usb usb2-port1: usb_disconnect: call pm_runtime_put [ 93.954786] xhci-hcd xhci-hcd.1.auto: // Ding dong! [ 93.954851] usb usb2: unregistering device [ 93.954863] usb usb2: unregistering interface 2-0:1.0 [ 93.955252] usb usb2: usb_disable_device nuking all URBs [ 93.955264] xHCI xhci_drop_endpoint called for root hub [ 93.955269] xHCI xhci_check_bandwidth called for root hub [ 93.955901] xhci-hcd xhci-hcd.1.auto: usb_remove_hcd:3016 [ 93.955914] xhci-hcd xhci-hcd.1.auto: USB bus 2 deregistered [ 93.966119] xhci-hcd xhci-hcd.1.auto: usb_remove_hcd:3037 [ 93.966130] xhci-hcd xhci-hcd.1.auto: remove, state 4 [ 93.971434] xhci-hcd xhci-hcd.1.auto: roothub graceful disconnect [ 93.971452] usb usb1: USB disconnect, device number 1 [ 93.976783] xhci-hcd xhci-hcd.1.auto: at xhci_plat_runtime_suspend [ 93.976807] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000240 [ 93.979340] usb usb1: unregistering device [ 93.985784] usb usb1: unregistering interface 1-0:1.0 [ 93.986064] Mem abort info: [ 93.989013] ESR = 0x96000004 [ 93.992222] EC = 0x25: DABT (current EL), IL = 32 bits [ 93.997735] SET = 0, FnV = 0 [ 94.000840] EA = 0, S1PTW = 0 [ 94.004125] Data abort info: [ 94.007220] ISV = 0, ISS = 0x00000004 [ 94.011255] CM = 0, WnR = 0 [ 94.014424] user pgtable: 4k pages, 48-bit VAs, pgdp=00000008b6dd3000 [ 94.021025] [0000000000000240] pgd=0000000000000000, p4d=0000000000000000 [ 94.028025] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 94.033605] Modules linked in: [ 94.036677] CPU: 0 PID: 121 Comm: kworker/0:2 Not tainted 5.6.0-rc4-next-20200304-03579-gfe93ddfb2c17-dirty #120 [ 94.047102] Hardware name: Freescale i.MX8QXP MEK (DT) [ 94.052261] Workqueue: pm pm_runtime_work [ 94.056280] pstate: 60000005 (nZCv daif -PAN -UAO) [ 94.061080] pc : xhci_suspend+0x30/0x698 [ 94.065013] lr : xhci_plat_runtime_suspend+0x80/0x90 [ 94.069976] sp : ffff80001243bbb0 [ 94.073294] x29: ffff80001243bbb0 x28: ffff800011c56000 [ 94.078611] x27: 0000000000000008 x26: ffff80001243bd38 [ 94.083928] x25: ffff8000101412d8 x24: 0000000000000000 [ 94.089244] x23: 0000000000000000 x22: ffff000837e8e104 [ 94.094562] x21: ffff80001085d058 x20: ffff000836b42000 [ 94.099878] x19: ffff000836b42250 x18: 0000000000000010 [ 94.105194] x17: 0000000000000000 x16: 0000000000000000 [ 94.110512] x15: ffff00083ace0470 x14: 8810000000000000 [ 94.115828] x13: 0000000000001b58 x12: ffff800011ef1000 [ 94.121145] x11: ffff800011c78000 x10: ffff800011ef1360 [ 94.126462] x9 : ffff800010c73d28 x8 : 0000000000000007 [ 94.131778] x7 : 0000000000000531 x6 : ffff800011ef1000 [ 94.137095] x5 : 0000000000000000 x4 : ffff00083f99b1b8 [ 94.142412] x3 : ffff00083f9aa208 x2 : fd7e7f70d3ccaf00 [ 94.147729] x1 : 0000000000000001 x0 : 0000000000000000 [ 94.153044] Call trace: [ 94.155505] xhci_suspend+0x30/0x698 [ 94.159090] xhci_plat_runtime_suspend+0x80/0x90 [ 94.163722] pm_generic_runtime_suspend+0x30/0x48 [ 94.168437] __rpm_callback+0x90/0x148 [ 94.172189] rpm_callback+0x28/0x88 [ 94.175682] rpm_suspend+0x100/0x5f0 [ 94.179260] rpm_idle+0x28c/0x420 [ 94.182579] pm_runtime_work+0xa0/0xc0 [ 94.186334] process_one_work+0x1c8/0x470 [ 94.190345] worker_thread+0x50/0x428 [ 94.194013] kthread+0xfc/0x128 [ 94.197158] ret_from_fork+0x10/0x18 [ 94.200741] Code: 34001f00 7100101f 54003141 f9400660 (b9424000) [ 94.206844] ---[ end trace 3ba7d185e60c149e ]---
2 & 3 are related debug message and trace message (I can't get the whole debug message due to console is stuck when the issue occurs)
2.
soot@imx8qmmek:~# echo 500 > /sys/bus/usb/devices/usb2/power/autosuspend_delay_m usb/devices/usb2/power/autosuspend_delay_ms[ 206.901805] xhci-hcd xhci-hcd.1.auto: remove, state 1 [ 206.906941] usb usb2: USB disconnect, device number 1ower/autosuspend_delay_ms [ 206.958838] usb 2-1: USB disconnect, device number 2 [ 206.994172] xhci-hcd xhci-hcd.1.auto: USB bus 2 deregistered [ 207.000017] xhci-hcd xhci-hcd.1.auto: remove, state 4 [ 207.005213] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000240 [ 207.014082] usb usb1: USB disconnect, device number 1 [ 207.014106] Mem abort info: [ 207.022005] ESR = 0x96000004 [ 207.025138] EC = 0x25: DABT (current EL), IL = 32 bits [ 207.030510] SET = 0, FnV = 0 [ 207.033598] EA = 0, S1PTW = 0 [ 207.036777] Data abort info: [ 207.039699] ISV = 0, ISS = 0x00000004 [ 207.043569] CM = 0, WnR = 0 [ 207.046549] user pgtable: 4k pages, 48-bit VAs, pgdp=00000008b6f19000
3.
kworker/0:5-411 [000] d..1 206.991613: rpm_idle: 2-1:1.0 flags-1 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.991626: rpm_return_int: rpm_idle+0xa8/0x420:2-1:1.0 ret=0 kworker/0:5-411 [000] d..1 206.991629: rpm_suspend: 2-1:1.0 flags-9 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.991634: rpm_idle: 2-1 flags-1 cnt-2 dep-0 auto-0 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.991636: rpm_return_int: rpm_idle+0xa8/0x420:2-1 ret=-11 kworker/0:5-411 [000] d..1 206.991638: rpm_return_int: rpm_suspend+0x174/0x5f0:2-1:1.0 ret=0 kworker/0:5-411 [000] .... 206.991839: rpm_usage: 2-1 flags-c cnt-1 dep-0 auto-0 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.992523: rpm_resume: 2-1 flags-4 cnt-2 dep-0 auto-0 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.992534: rpm_return_int: rpm_resume+0x114/0x758:2-1 ret=1 kworker/0:5-411 [000] .... 206.992548: rpm_usage: 2-1 flags-4 cnt-1 dep-0 auto-0 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.992618: rpm_idle: usb2 flags-1 cnt-0 dep-0 auto-1 p-0 irq-0 child-1 kworker/0:5-411 [000] d..1 206.992627: rpm_return_int: rpm_idle+0xa8/0x420:usb2 ret=-16 kworker/0:5-411 [000] .... 206.992690: rpm_usage: usb2-port1 flags-5 cnt-2 dep-0 auto-1 p-0 irq-0 child-0 kworker/0:5-411 [000] .... 206.992892: rpm_usage: usb2-port1 flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.992924: rpm_idle: 2-0:1.0 flags-4 cnt-0 dep-0 auto-1 p-0 irq-0 child-1 kworker/0:5-411 [000] d..1 206.992928: rpm_return_int: rpm_idle+0xa8/0x420:2-0:1.0 ret=0 kworker/0:5-411 [000] d..1 206.992931: rpm_suspend: 2-0:1.0 flags-c cnt-0 dep-0 auto-1 p-0 irq-0 child-1 kworker/0:5-411 [000] d..1 206.992936: rpm_idle: usb2 flags-1 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.992944: rpm_return_int: rpm_idle+0x2c8/0x420:usb2 ret=0 kworker/0:5-411 [000] d..1 206.992946: rpm_return_int: rpm_suspend+0x174/0x5f0:2-0:1.0 ret=0 kworker/0:5-411 [000] d..1 206.993033: rpm_idle: usb2 flags-2 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/0:5-411 [000] d..1 206.993038: rpm_return_int: rpm_idle+0xa8/0x420:usb2 ret=-16 kworker/u8:5-409 [002] d..1 206.993112: rpm_resume: 2-0:1.0 flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-1 kworker/u8:5-409 [002] d..1 206.993118: rpm_idle: 2-0:1.0 flags-1 cnt-1 dep-0 auto-1 p-0 irq-0 child-1 kworker/u8:5-409 [002] d..1 206.993121: rpm_return_int: rpm_idle+0xa8/0x420:2-0:1.0 ret=-11 kworker/u8:5-409 [002] d..1 206.993123: rpm_return_int: rpm_resume+0x114/0x758:2-0:1.0 ret=1 kworker/u8:5-409 [002] d..1 206.993135: rpm_idle: 2-0:1.0 flags-4 cnt-0 dep-0 auto-1 p-0 irq-0 child-1 kworker/u8:5-409 [002] d..1 206.993138: rpm_return_int: rpm_idle+0xa8/0x420:2-0:1.0 ret=0 kworker/u8:5-409 [002] d..1 206.993140: rpm_suspend: 2-0:1.0 flags-c cnt-0 dep-0 auto-1 p-0 irq-0 child-1 kworker/u8:5-409 [002] d..1 206.993143: rpm_idle: usb2 flags-1 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/u8:5-409 [002] d..1 206.993159: rpm_return_int: rpm_idle+0x2c8/0x420:usb2 ret=0 kworker/u8:5-409 [002] d..1 206.993161: rpm_return_int: rpm_suspend+0x174/0x5f0:2-0:1.0 ret=0 kworker/u8:5-409 [002] d..1 206.993165: rpm_resume: usb2 flags-4 cnt-1 dep-0 auto-1 p-1 irq-0 child-0 kworker/u8:5-409 [002] d..1 206.993167: rpm_return_int: rpm_resume+0x114/0x758:usb2 ret=1 kworker/u8:5-409 [003] d..1 206.993257: rpm_resume: usb1-port1 flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 kworker/u8:5-409 [003] d..1 206.993261: rpm_return_int: rpm_resume+0x114/0x758:usb1-port1 ret=1 kworker/u8:5-409 [003] d..1 206.993275: rpm_idle: usb2-port1 flags-5 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/u8:5-409 [003] d..1 206.993288: rpm_return_int: rpm_idle+0x2c8/0x420:usb2-port1 ret=0 kworker/u8:5-409 [003] d..1 206.993291: rpm_idle: usb1-port1 flags-5 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/u8:5-409 [003] d..1 206.993295: rpm_return_int: rpm_idle+0x2c8/0x420:usb1-port1 ret=0 kworker/u8:5-409 [003] d..1 206.993383: rpm_suspend: usb2 flags-c cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/u8:5-409 [003] d..1 206.993392: rpm_return_int: rpm_suspend+0x174/0x5f0:usb2 ret=0 kworker/3:2-208 [003] d..1 206.993770: rpm_idle: usb1-port1 flags-2 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/3:2-208 [003] d..1 206.993775: rpm_return_int: rpm_idle+0xa8/0x420:usb1-port1 ret=0 kworker/3:2-208 [003] d..1 206.993778: rpm_suspend: usb1-port1 flags-a cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/3:2-208 [003] d..1 206.993785: rpm_return_int: rpm_suspend+0x174/0x5f0:usb1-port1 ret=-11 kworker/u8:5-409 [003] d..1 206.993956: rpm_resume: usb2 flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 kworker/u8:5-409 [003] d..1 206.993964: rpm_return_int: rpm_resume+0x114/0x758:usb2 ret=1 kworker/u8:5-409 [003] d..1 206.993977: rpm_idle: usb2 flags-4 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/u8:5-409 [003] d..1 206.993982: rpm_return_int: rpm_idle+0xa8/0x420:usb2 ret=-16 kworker/u8:5-409 [003] d..1 206.994064: rpm_idle: xhci-hcd.1.auto flags-1 cnt-0 dep-0 auto-1 p-0 irq-0 child -0
kworker/3:2-208 [003] d..1 207.005157: rpm_idle: xhci-hcd.1.auto flags-2 cnt-0 dep-0 auto-1 p-0 irq-0 child -0 kworker/3:2-208 [003] d..1 207.005169: rpm_return_int: rpm_idle+0xa8/0x420:xhci-hcd.1.auto ret=0 kworker/3:2-208 [003] d..1 207.005172: rpm_suspend: xhci-hcd.1.auto flags-a cnt-0 dep-0 auto-1 p-0 irq-0 ch ild-0 kworker/u8:5-409 [002] d..1 207.022142: rpm_resume: 1-0:1.0 flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-1 kworker/u8:5-409 [002] d..1 207.022153: rpm_resume: usb1 flags-0 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 kworker/u8:5-409 [002] d..1 207.022155: rpm_resume: xhci-hcd.1.auto flags-0 cnt-1 dep-0 auto-1 p-0 irq-0 chi ld-0
Currently probe and remove looks like:
xhci_plat_probe() pm_runtime_set_active() pm_runtime_enable() pm_runtime_get_noresume() ... pm_runtime_put_noidle() pm_runtime_forbid()
xhci_plat_remove() <remove and put both hcd's> pm_runtime_set_suspended() pm_runtime_disable()
Would it make sense to change xhci_plat_remove() to
xhci_plat_remove() pm_runtime_disable() <remove and put both hcd's> pm_runtime_set_suspended()
I think this makes sense and tested at my platform.
Peter
...
This seems odd,
I don't understand why pm_runtime_set_suspended() would call pm_runtime_put_sync()?
I thought pm_runtime_set_suspended() is used to let pm core know the hardware is suspended,
and inform the parent of this, possibly allowing parent to runtime suspend.
Not sure if it's needed in the remove function.
It makes sense to allow the parent to suspend, but xhci isn't really suspended.
Yes is stopped, but we can't resume our way back to a active state.
Could it be that the runtime suspend call you are seeing is because we are removing all child devices,
disconnecting and freeing everything, including roothubs and hcd's.
Maybe runtime pm should be disabled a bit earlier.
Currently probe and remove looks like:
xhci_plat_probe()
pm_runtime_set_active()
pm_runtime_enable()
pm_runtime_get_noresume()
...
pm_runtime_put_noidle()
pm_runtime_forbid()
xhci_plat_remove()
<remove and put both hcd's>
pm_runtime_set_suspended()
pm_runtime_disable()
Would it make sense to change xhci_plat_remove() to
xhci_plat_remove()
pm_runtime_disable()
<remove and put both hcd's>
pm_runtime_set_suspended()
or possibly wrapping the remove in a runtime get/put:
xhci_plat_remove()
pm_runtime_get_noresume()
pm_runtime_disable()
> <remove and put both hcd's>
> pm_runtime_set_suspended()
> pm_runtime_put_noidle()
I think it's better to keep runtime active during driver removal, how about this:
pm_runtime_get_sync() <remove and put both hcd's> pm_runtime_disable() pm_runtime_put_noidle() pm_runtime_set_suspended()
thanks Li Jun
-Mathias
On 20-05-13 14:45:42, Jun Li wrote:
...
Would it make sense to change xhci_plat_remove() to
xhci_plat_remove()
pm_runtime_disable()
<remove and put both hcd's>
pm_runtime_set_suspended()
or possibly wrapping the remove in a runtime get/put:
xhci_plat_remove()
pm_runtime_get_noresume()
pm_runtime_disable()
> <remove and put both hcd's>
> pm_runtime_set_suspended()
> pm_runtime_put_noidle()
I think it's better to keep runtime active during driver removal, how about this:
pm_runtime_get_sync() <remove and put both hcd's> pm_runtime_disable() pm_runtime_put_noidle() pm_runtime_set_suspended()
I think it is more reasonable since for some DRD controllers if DRD core is suspended, access the xHCI register (eg, we remove xhci-plat-hcd module at the time) may hang the system. Alan & Mathias, what's your opinion?
On Thu, May 14, 2020 at 01:31:59AM +0000, Peter Chen wrote:
On 20-05-13 14:45:42, Jun Li wrote:
...
Would it make sense to change xhci_plat_remove() to
xhci_plat_remove()
pm_runtime_disable()
<remove and put both hcd's>
pm_runtime_set_suspended()
or possibly wrapping the remove in a runtime get/put:
xhci_plat_remove()
pm_runtime_get_noresume()
pm_runtime_disable()
> <remove and put both hcd's>
> pm_runtime_set_suspended()
> pm_runtime_put_noidle()
I think it's better to keep runtime active during driver removal, how about this:
pm_runtime_get_sync() <remove and put both hcd's> pm_runtime_disable() pm_runtime_put_noidle() pm_runtime_set_suspended()
I think it is more reasonable since for some DRD controllers if DRD core is suspended, access the xHCI register (eg, we remove xhci-plat-hcd module at the time) may hang the system. Alan & Mathias, what's your opinion?
Jun's suggestion looks good to me.
Alan Stern
On 14.5.2020 4.34, Alan Stern wrote:
On Thu, May 14, 2020 at 01:31:59AM +0000, Peter Chen wrote:
On 20-05-13 14:45:42, Jun Li wrote:
...
Would it make sense to change xhci_plat_remove() to
xhci_plat_remove()
pm_runtime_disable()
<remove and put both hcd's>
pm_runtime_set_suspended()
or possibly wrapping the remove in a runtime get/put:
xhci_plat_remove()
pm_runtime_get_noresume()
pm_runtime_disable()
> <remove and put both hcd's>
> pm_runtime_set_suspended()
> pm_runtime_put_noidle()
I think it's better to keep runtime active during driver removal, how about this:
pm_runtime_get_sync() <remove and put both hcd's> pm_runtime_disable() pm_runtime_put_noidle() pm_runtime_set_suspended()
I think it is more reasonable since for some DRD controllers if DRD core is suspended, access the xHCI register (eg, we remove xhci-plat-hcd module at the time) may hang the system. Alan & Mathias, what's your opinion?
Makes sense to me
Jun's suggestion looks good to me.
Alan Stern
Great, lets go with this then. Jun, or Peter, could you turn this into a patch and check that it works? I only got PCI xHC's to play with myself.
Thanks -Mathias
xhci_plat_remove()
pm_runtime_disable()
<remove and put both hcd's>
pm_runtime_set_suspended()
or possibly wrapping the remove in a runtime get/put:
xhci_plat_remove()
pm_runtime_get_noresume()
pm_runtime_disable()
> <remove and put both hcd's>
> pm_runtime_set_suspended()
> pm_runtime_put_noidle()
I think it's better to keep runtime active during driver removal, how about this:
pm_runtime_get_sync() <remove and put both hcd's> pm_runtime_disable() pm_runtime_put_noidle() pm_runtime_set_suspended()
I think it is more reasonable since for some DRD controllers if DRD core is suspended, access the xHCI register (eg, we remove xhci-plat-hcd module at the time) may hang the system. Alan & Mathias, what's your opinion?
Makes sense to me
Jun's suggestion looks good to me.
Alan Stern
Great, lets go with this then. Jun, or Peter, could you turn this into a patch and check that it works? I only got PCI xHC's to play with myself.
Jun, would you please create a patch for it. I have tested it at CDNS3 platform, feel free add my tag.
Reviewed-by: Peter Chen peter.chen@nxp.com Tested-by: Peter Chen peter.chen@nxp.com
Peter
-----Original Message----- From: Peter Chen peter.chen@nxp.com Sent: 2020年5月14日 17:11 To: Mathias Nyman mathias.nyman@linux.intel.com; Alan Stern stern@rowland.harvard.edu Cc: Jun Li jun.li@nxp.com; Manu Gautam mgautam@codeaurora.org; mathias.nyman@intel.com; linux-usb@vger.kernel.org; gregkh@linuxfoundation.org; dl-linux-imx linux-imx@nxp.com; Baolin Wang baolin.wang@linaro.org; stable@vger.kernel.org Subject: RE: 回复: [PATCH 1/1] usb: host: xhci-plat: keep runtime active when remove host
xhci_plat_remove()
pm_runtime_disable()
<remove and put both hcd's>
pm_runtime_set_suspended()
or possibly wrapping the remove in a runtime get/put:
xhci_plat_remove()
pm_runtime_get_noresume()
pm_runtime_disable()
> <remove and put both hcd's>
> pm_runtime_set_suspended()
> pm_runtime_put_noidle()
I think it's better to keep runtime active during driver removal, how about this:
pm_runtime_get_sync() <remove and put both hcd's> pm_runtime_disable() pm_runtime_put_noidle() pm_runtime_set_suspended()
I think it is more reasonable since for some DRD controllers if DRD core is suspended, access the xHCI register (eg, we remove xhci-plat-hcd module at the time) may hang the system. Alan & Mathias, what's your opinion?
Makes sense to me
Jun's suggestion looks good to me.
Alan Stern
Great, lets go with this then. Jun, or Peter, could you turn this into a patch and check that it works? I only got PCI xHC's to play with myself.
Jun, would you please create a patch for it. I have tested it at CDNS3 platform, feel free add my tag.
Reviewed-by: Peter Chen peter.chen@nxp.com Tested-by: Peter Chen peter.chen@nxp.com
Thanks, I will send out a v2 for this.
Li Jun
Peter
On Tue, 12 May 2020, Peter Chen wrote:
From: Li Jun jun.li@nxp.com
While remove host(e.g. for USB role switch from host to device), if runtime pm is enabled by user, below oops are occurs at dwc3 and cdns3 platform. Keep the xhci-plat device being active during remove host fixes them.
...
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 1d4f6f85f0fe..f38d53528c96 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -362,6 +362,7 @@ static int xhci_plat_remove(struct platform_device *dev) struct clk *reg_clk = xhci->reg_clk; struct usb_hcd *shared_hcd = xhci->shared_hcd;
- pm_runtime_get_sync(&dev->dev); xhci->xhc_state |= XHCI_STATE_REMOVING;
usb_remove_hcd(shared_hcd);
You mustn't add a pm_runtime_get call without a corresponding pm_runtime_put call.
With just this one call, if the role switched from host to device and then back to host, then the host would never be able to go into runtime suspend.
In this case the correspondence between the get's and the put's will probably be obscure; some comments would help.
Alan Stern
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 1d4f6f85f0fe..f38d53528c96 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -362,6 +362,7 @@ static int xhci_plat_remove(struct platform_device *dev) struct clk *reg_clk = xhci->reg_clk; struct usb_hcd *shared_hcd = xhci->shared_hcd;
pm_runtime_get_sync(&dev->dev); xhci->xhc_state |= XHCI_STATE_REMOVING;
usb_remove_hcd(shared_hcd);
You mustn't add a pm_runtime_get call without a corresponding pm_runtime_put call.
With just this one call, if the role switched from host to device and then back to host, then the host would never be able to go into runtime suspend.
I may not consider carefully for other cases, for my case, the xhci-plat device will be removed, and re-created. But if we remove the driver using modprobe, it may have issues.
In this case the correspondence between the get's and the put's will probably be obscure; some comments would help.
I explained at the reply for Mathias's, but I am not 100% it is the root cause.
Peter
On Wed, May 13, 2020 at 11:08:13AM +0000, Peter Chen wrote:
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 1d4f6f85f0fe..f38d53528c96 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -362,6 +362,7 @@ static int xhci_plat_remove(struct platform_device *dev) struct clk *reg_clk = xhci->reg_clk; struct usb_hcd *shared_hcd = xhci->shared_hcd;
pm_runtime_get_sync(&dev->dev); xhci->xhc_state |= XHCI_STATE_REMOVING;
usb_remove_hcd(shared_hcd);
You mustn't add a pm_runtime_get call without a corresponding pm_runtime_put call.
With just this one call, if the role switched from host to device and then back to host, then the host would never be able to go into runtime suspend.
I may not consider carefully for other cases, for my case, the xhci-plat device will be removed, and re-created. But if we remove the driver using modprobe, it may have issues.
In this case the correspondence between the get's and the put's will probably be obscure; some comments would help.
I explained at the reply for Mathias's, but I am not 100% it is the root cause.
Okay, but there's something you may not know: pm_runtime_set_suspended() should be called _after_ pm_runtime_disable(). If you try to set the state to "suspended" while the device is enabled for runtime PM, the call may very well fail.
In this case, if all you want to do is block callbacks to the xHCI suspend routine, pm_runtime_disable() is probably all you need.
Alan Stern
linux-stable-mirror@lists.linaro.org