Hello all, I recently did have a regression on v6.4rc1, and it seems that the same exact issue is now happening also on v6.1.28.
I was not able yet to bisect it (yet), but what is happening is that libusbgx[1] that we use to configure a USB NCM gadget interface[2][3] just hang completely at boot.
This is happening with multiple ARM32 and ARM64 i.MX SOC (i.MX6, i.MX7, i.MX8MM).
The logs is something like that
``` [* �F] A start job is running for Load def…t schema g1.schema (6s / no limit) M[K[** �F] A start job is running for Load def…t schema g1.schema (7s / no limit) M[K[*** �F] A start job is running for Load def…t schema g1.schema (8s / no limit) M[K[ *** �F] A start job is running for Load def…t schema g1.schema (8s / no limit) ```
I will try to bisect this and provide more useful feedback ASAP, I decided to not wait for it and just send this email in case someone has some insight on what is going on.
Francesco
[1] https://github.com/linux-usb-gadgets/libusbgx [2] https://git.toradex.com/cgit/meta-toradex-bsp-common.git/tree/recipes-suppor... [3] https://git.toradex.com/cgit/meta-toradex-bsp-common.git/tree/recipes-suppor...
On Fri, May 12, 2023 at 11:07:10AM +0200, Francesco Dolcini wrote:
I recently did have a regression on v6.4rc1, and it seems that the same exact issue is now happening also on v6.1.28.
I was not able yet to bisect it (yet), but what is happening is that libusbgx[1] that we use to configure a USB NCM gadget interface[2][3] just hang completely at boot.
...
[3] https://git.toradex.com/cgit/meta-toradex-bsp-common.git/tree/recipes-suppor...
Whoops, this is supposed to be
[3] https://git.toradex.com/cgit/meta-toradex-bsp-common.git/tree/recipes-suppor...
On Fri, May 12, 2023 at 11:07:10AM +0200, Francesco Dolcini wrote:
Hello all, I recently did have a regression on v6.4rc1, and it seems that the same exact issue is now happening also on v6.1.28.
I was not able yet to bisect it (yet), but what is happening is that libusbgx[1] that we use to configure a USB NCM gadget interface[2][3] just hang completely at boot.
This is happening with multiple ARM32 and ARM64 i.MX SOC (i.MX6, i.MX7, i.MX8MM).
The logs is something like that
[* �F] A start job is running for Load def…t schema g1.schema (6s / no limit) M[K[** �F] A start job is running for Load def…t schema g1.schema (7s / no limit) M[K[*** �F] A start job is running for Load def…t schema g1.schema (8s / no limit) M[K[ *** �F] A start job is running for Load def…t schema g1.schema (8s / no limit)
I will try to bisect this and provide more useful feedback ASAP, I decided to not wait for it and just send this email in case someone has some insight on what is going on.
I noticed a similar problem on the Qualcomm MSM8916 SoC (chipidea USB driver) and reverting commit 0db213ea8eed ("usb: gadget: udc: core: Invoke usb_gadget_connect only when started") fixes it for me. The follow-up commit a3afbf5cc887 ("usb: gadget: udc: core: Prevent redundant calls to pullup") must be reverted first to avoid conflicts. These two were also backported into 6.1.28.
I didn't have time to investigate it further yet. With these patches it just hangs forever when setting up the USB gadget.
Stephan
On Fri, May 12, 2023 at 12:55:46PM +0200, Stephan Gerhold wrote:
On Fri, May 12, 2023 at 11:07:10AM +0200, Francesco Dolcini wrote:
Hello all, I recently did have a regression on v6.4rc1, and it seems that the same exact issue is now happening also on v6.1.28.
I was not able yet to bisect it (yet), but what is happening is that libusbgx[1] that we use to configure a USB NCM gadget interface[2][3] just hang completely at boot.
This is happening with multiple ARM32 and ARM64 i.MX SOC (i.MX6, i.MX7, i.MX8MM).
The logs is something like that
[* �F] A start job is running for Load def…t schema g1.schema (6s / no limit) M[K[** �F] A start job is running for Load def…t schema g1.schema (7s / no limit) M[K[*** �F] A start job is running for Load def…t schema g1.schema (8s / no limit) M[K[ *** �F] A start job is running for Load def…t schema g1.schema (8s / no limit)
I will try to bisect this and provide more useful feedback ASAP, I decided to not wait for it and just send this email in case someone has some insight on what is going on.
I noticed a similar problem on the Qualcomm MSM8916 SoC (chipidea USB driver) and reverting commit 0db213ea8eed ("usb: gadget: udc: core: Invoke usb_gadget_connect only when started") fixes it for me. The follow-up commit a3afbf5cc887 ("usb: gadget: udc: core: Prevent redundant calls to pullup") must be reverted first to avoid conflicts. These two were also backported into 6.1.28.
Thanks for the confirmation.
I didn't have time to investigate it further yet. With these patches it just hangs forever when setting up the USB gadget.
I will double check that the same is happening to me and send a revert afterward.
Francesco
On Freitag, 12. Mai 2023 12:55:46 CEST Stephan Gerhold wrote:
On Fri, May 12, 2023 at 11:07:10AM +0200, Francesco Dolcini wrote:
Hello all, I recently did have a regression on v6.4rc1, and it seems that the same exact issue is now happening also on v6.1.28.
I was not able yet to bisect it (yet), but what is happening is that libusbgx[1] that we use to configure a USB NCM gadget interface[2][3] just hang completely at boot.
This is happening with multiple ARM32 and ARM64 i.MX SOC (i.MX6, i.MX7, i.MX8MM).
The logs is something like that
[* �F] A start job is running for Load def…t schema g1.schema (6s / no limit) M[K[** �F] A start job is running for Load def…t schema g1.schema (7s / no limit) M[K[*** �F] A start job is running for Load def…t schema g1.schema (8s / no limit) M[K[ *** �F] A start job is running for Load def…t schema g1.schema (8s / no limit) ``` I will try to bisect this and provide more useful feedback ASAP, I decided to not wait for it and just send this email in case someone has some insight on what is going on.
I noticed a similar problem on the Qualcomm MSM8916 SoC (chipidea USB driver) and reverting commit 0db213ea8eed ("usb: gadget: udc: core: Invoke usb_gadget_connect only when started") fixes it for me. The follow-up commit a3afbf5cc887 ("usb: gadget: udc: core: Prevent redundant calls to pullup") must be reverted first to avoid conflicts. These two were also backported into 6.1.28.
Hi,
to confirm I'm seeing the same issue on Qualcomm MSM8974 and MSM8226 boards. Reverting the patches Stephan mentioned makes it work again on v6.4-rc1.
Regards Luca
I didn't have time to investigate it further yet. With these patches it just hangs forever when setting up the USB gadget.
Stephan
On Fri, May 12, 2023 at 05:42:03PM +0200, Luca Weiss wrote:
to confirm I'm seeing the same issue on Qualcomm MSM8974 and MSM8226 boards. Reverting the patches Stephan mentioned makes it work again on v6.4-rc1.
https://lore.kernel.org/all/20230512131435.205464-1-francesco@dolcini.it/
Hi all,
Thanks for reporting ! Do you see the system to crash (or) wait indefinitely for the gadget being pulled up ? Is it possible to get the stack trace ?
Thanks, Badhri
On Fri, May 12, 2023 at 8:44 AM Francesco Dolcini francesco@dolcini.it wrote:
On Fri, May 12, 2023 at 05:42:03PM +0200, Luca Weiss wrote:
to confirm I'm seeing the same issue on Qualcomm MSM8974 and MSM8226 boards. Reverting the patches Stephan mentioned makes it work again on v6.4-rc1.
https://lore.kernel.org/all/20230512131435.205464-1-francesco@dolcini.it/
On Mon, May 15, 2023 at 01:38:30PM -0700, Badhri Jagan Sridharan wrote:
Do you see the system to crash (or) wait indefinitely for the gadget being pulled up ?
It wait indefinitely. Likely a deadlock.
Is it possible to get the stack trace ?
I was able to generate this enabling some debugging kconfig:
[ 41.341580] ============================================ [ 41.349246] WARNING: possible recursive locking detected [ 41.357120] 6.4.0-rc1-0.0.0-devel-00005-gcda3c69ebc14 #1 Not tainted [ 41.357138] -------------------------------------------- [ 41.357143] echo/566 is trying to acquire lock: [ 41.357153] c4b0a72c (&udc->connect_lock){+.+.}-{4:4}, at: usb_udc_vbus_handler+0x1c/0x60 [ 41.357209] [ 41.357209] but task is already holding lock: [ 41.357214] c4b0a72c (&udc->connect_lock){+.+.}-{4:4}, at: gadget_bind_driver+0x110/0x230 [ 41.357263] [ 41.357263] other info that might help us debug this: [ 41.357272] Possible unsafe locking scenario: [ 41.357272] [ 41.357279] CPU0 [ 41.357285] ---- [ 41.357291] lock(&udc->connect_lock); [ 41.357304] lock(&udc->connect_lock); [ 41.357316] [ 41.357316] *** DEADLOCK *** [ 41.357316] [ 41.357319] May be due to missing lock nesting notation [ 41.357319] [ 41.357324] 6 locks held by echo/566: [ 41.357332] #0: c430fabc (sb_writers#11){.+.+}-{0:0}, at: ksys_write+0x70/0xf8 [ 41.357377] #1: c5b26e98 (&buffer->mutex){+.+.}-{4:4}, at: configfs_write_iter+0x24/0x118 [ 41.357420] #2: c5284548 (&p->frag_sem){.+.+}-{4:4}, at: configfs_write_iter+0x88/0x118 [ 41.357462] #3: c55a2a20 (&gi->lock){+.+.}-{4:4}, at: gadget_dev_desc_UDC_store+0x58/0x110 [ 41.357503] #4: c4b5648c (&dev->mutex){....}-{4:4}, at: __driver_attach+0x108/0x1cc [ 41.357538] #5: c4b0a72c (&udc->connect_lock){+.+.}-{4:4}, at: gadget_bind_driver+0x110/0x230 [ 41.357578] [ 41.357578] stack backtrace: [ 41.357585] CPU: 1 PID: 566 Comm: echo Not tainted 6.4.0-rc1-0.0.0-devel-00005-gcda3c69ebc14 #1 [ 41.357596] Hardware name: Freescale i.MX7 Dual (Device Tree) [ 41.357612] unwind_backtrace from show_stack+0x10/0x14 [ 41.357639] show_stack from dump_stack_lvl+0x70/0xb0 [ 41.357660] dump_stack_lvl from __lock_acquire+0x924/0x22c4 [ 41.357681] __lock_acquire from lock_acquire+0x100/0x370 [ 41.357699] lock_acquire from __mutex_lock+0xa8/0xfb4 [ 41.357720] __mutex_lock from mutex_lock_nested+0x1c/0x24 [ 41.357742] mutex_lock_nested from usb_udc_vbus_handler+0x1c/0x60 [ 41.357769] usb_udc_vbus_handler from ci_udc_start+0x74/0x9c [ 41.357798] ci_udc_start from gadget_bind_driver+0x130/0x230 [ 41.357824] gadget_bind_driver from really_probe+0xd8/0x3fc [ 41.357846] really_probe from __driver_probe_device+0x94/0x1f0 [ 41.357862] __driver_probe_device from driver_probe_device+0x2c/0xc4 [ 41.357877] driver_probe_device from __driver_attach+0x114/0x1cc [ 41.357893] __driver_attach from bus_for_each_dev+0x7c/0xcc [ 41.357915] bus_for_each_dev from bus_add_driver+0xd4/0x200 [ 41.357942] bus_add_driver from driver_register+0x7c/0x114 [ 41.357965] driver_register from usb_gadget_register_driver_owner+0x40/0xe0 [ 41.357987] usb_gadget_register_driver_owner from gadget_dev_desc_UDC_store+0xd4/0x110 [ 41.358014] gadget_dev_desc_UDC_store from configfs_write_iter+0xac/0x118 [ 41.358042] configfs_write_iter from vfs_write+0x1b4/0x40c [ 41.358068] vfs_write from ksys_write+0x70/0xf8 [ 41.358088] ksys_write from ret_fast_syscall+0x0/0x1c [ 41.358106] Exception stack(0xf0f15fa8 to 0xf0f15ff0) [ 41.358119] 5fa0: 0000000a 00a741c0 00000001 00a741c0 0000000a 00000001 [ 41.358132] 5fc0: 0000000a 00a741c0 b6f7dba0 00000004 0000000a 00000001 00000000 b6f7d388 [ 41.358141] 5fe0: 00000004 beec4b80 b6f1c1f3 b6e9b5f6
On Fri, May 12, 2023 at 11:07:10AM +0200, Francesco Dolcini wrote:
Hello all, I recently did have a regression on v6.4rc1, and it seems that the same exact issue is now happening also on v6.1.28.
I was not able yet to bisect it (yet), but what is happening is that libusbgx[1] that we use to configure a USB NCM gadget interface[2][3] just hang completely at boot.
This is happening with multiple ARM32 and ARM64 i.MX SOC (i.MX6, i.MX7, i.MX8MM).
The logs is something like that
[* �F] A start job is running for Load def…t schema g1.schema (6s / no limit) M[K[** �F] A start job is running for Load def…t schema g1.schema (7s / no limit) M[K[*** �F] A start job is running for Load def…t schema g1.schema (8s / no limit) M[K[ *** �F] A start job is running for Load def…t schema g1.schema (8s / no limit)
I will try to bisect this and provide more useful feedback ASAP, I decided to not wait for it and just send this email in case someone has some insight on what is going on.
Thanks for the report. I'm adding it to regzbot:
#regzbot ^introduced: 0db213ea8eed55 #regzbot titile: libusbgx hang completely at boot (stuck at loading g1.schema)
linux-stable-mirror@lists.linaro.org