Sorry for the late reply.
On Fri, Feb 09, 2024, Marek Szyprowski wrote:
On 08.02.2024 23:54, Thinh Nguyen wrote:
On Wed, Feb 07, 2024, Marek Szyprowski wrote:
On 19.01.2024 10:48, Uttkarsh Aggarwal wrote:
In current scenario if Plug-out and Plug-In performed continuously there could be a chance while checking for dwc->gadget_driver in dwc3_gadget_suspend, a NULL pointer dereference may occur.
Call Stack:
CPU1: CPU2: gadget_unbind_driver dwc3_suspend_common dwc3_gadget_stop dwc3_gadget_suspend dwc3_disconnect_gadget
CPU1 basically clears the variable and CPU2 checks the variable. Consider CPU1 is running and right before gadget_driver is cleared and in parallel CPU2 executes dwc3_gadget_suspend where it finds dwc->gadget_driver which is not NULL and resumes execution and then CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where it checks dwc->gadget_driver is already NULL because of which the NULL pointer deference occur.
Cc: stable@vger.kernel.org Fixes: 9772b47a4c29 ("usb: dwc3: gadget: Fix suspend/resume during device mode") Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Signed-off-by: Uttkarsh Aggarwal quic_uaggarwa@quicinc.com
This patch landed some time ago in linux-next as commit 61a348857e86 ("usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend"). Recently I found that it causes the following warning when no USB gadget is bound to the DWC3 driver and a system suspend/resume cycle is performed:
dwc3 12400000.usb: wait for SETUP phase timed out dwc3 12400000.usb: failed to set STALL on ep0out ------------[ cut here ]------------ WARNING: CPU: 4 PID: 604 at drivers/usb/dwc3/ep0.c:289 dwc3_ep0_out_start+0xc8/0xcc Modules linked in: CPU: 4 PID: 604 Comm: rtcwake Not tainted 6.8.0-rc3-next-20240207 #7979 Hardware name: Samsung Exynos (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0x7c/0x1bc __warn from warn_slowpath_fmt+0x1a0/0x1a8 warn_slowpath_fmt from dwc3_ep0_out_start+0xc8/0xcc dwc3_ep0_out_start from dwc3_gadget_soft_disconnect+0x16c/0x230 dwc3_gadget_soft_disconnect from dwc3_gadget_suspend+0xc/0x90 dwc3_gadget_suspend from dwc3_suspend_common+0x44/0x30c dwc3_suspend_common from dwc3_suspend+0x14/0x2c dwc3_suspend from dpm_run_callback+0x94/0x288 dpm_run_callback from device_suspend+0x130/0x6d0 device_suspend from dpm_suspend+0x124/0x35c dpm_suspend from dpm_suspend_start+0x64/0x6c dpm_suspend_start from suspend_devices_and_enter+0x134/0xbd8 suspend_devices_and_enter from pm_suspend+0x2ec/0x380 pm_suspend from state_store+0x68/0xc8 state_store from kernfs_fop_write_iter+0x110/0x1d4 kernfs_fop_write_iter from vfs_write+0x2e8/0x430 vfs_write from ksys_write+0x5c/0xd4 ksys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xf1421fa8 to 0xf1421ff0) ... irq event stamp: 14304 hardirqs last enabled at (14303): [<c01a599c>] console_unlock+0x108/0x114 hardirqs last disabled at (14304): [<c0c229d8>] _raw_spin_lock_irqsave+0x64/0x68 softirqs last enabled at (13030): [<c010163c>] __do_softirq+0x318/0x4f4 softirqs last disabled at (13025): [<c012dd40>] __irq_exit_rcu+0x130/0x184 ---[ end trace 0000000000000000 ]---
IMHO dwc3_gadget_soft_disconnect() requires some kind of a check if dwc->gadget_driver is present or not, as it really makes no sense to do
I don't think checking that is sufficient, and I don't think that's the case here.
any ep0 related operations if there is no gadget driver at all.
If there's indeed no gadget_driver present, then we wouldn't get this stack trace. (ie. dwc3_ep0_out_start should occurs when gadget_driver is present). This is a race happened between binding + suspend.
I have no gadget compiled into the kernel and no such created via configfs, so how can this be caused by a race?
Ah... In that case, we got through the incomplete/wrong check for dwc3_gadget_soft_disconnect(): if (dwc->ep0state != EP0_SETUP_PHASE)
Since there's no gadget driver, the controller never started and the ep0state is defaulted to EP0_UNCONNECTED, which explained why it got into the timeout condition above and incorrectly attempt to start the control transfer.
I think something like this should be sufficient. Would you mind giving it a try?
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 564976b3e2b9..1990d6371066 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2656,6 +2656,11 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) int ret; spin_lock_irqsave(&dwc->lock, flags);
- if (!dwc->pullups_connected) {
spin_unlock_irqrestore(&dwc->lock, flags);
return 0;
- }
- dwc->connected = false;
/*
This patch fixes the reported issue. Feel free to add:
Tested-by: Marek Szyprowski m.szyprowski@samsung.com
Thanks for the report and Tested-by! I'll send a fix patch soon.
BR, Thinh