On Fri, Aug 08, 2025 at 06:23:05PM +0530, Selvarasu Ganesan wrote:
This commit addresses a rarely observed endpoint command timeout which causes kernel panic due to warn when 'panic_on_warn' is enabled and unnecessary call trace prints when 'panic_on_warn' is disabled. It is seen during fast software-controlled connect/disconnect testcases. The following is one such endpoint command timeout that we observed:
Connect
->dwc3_thread_interrupt ->dwc3_ep0_interrupt ->configfs_composite_setup ->composite_setup ->usb_ep_queue ->dwc3_gadget_ep0_queue ->__dwc3_gadget_ep0_queue ->__dwc3_ep0_do_control_data ->dwc3_send_gadget_ep_cmd
Disconnect
->dwc3_thread_interrupt ->dwc3_gadget_disconnect_interrupt ->dwc3_ep0_reset_state ->dwc3_ep0_end_control_data ->dwc3_send_gadget_ep_cmd
In the issue scenario, in Exynos platforms, we observed that control transfers for the previous connect have not yet been completed and end transfer command sent as a part of the disconnect sequence and processing of USB_ENDPOINT_HALT feature request from the host timeout. This maybe an expected scenario since the controller is processing EP commands sent as a part of the previous connect. It maybe better to remove WARN_ON in all places where device endpoint commands are sent to avoid unnecessary kernel panic due to warn.
Cc: stable@vger.kernel.org Co-developed-by: Akash M akash.m5@samsung.com Signed-off-by: Akash M akash.m5@samsung.com Signed-off-by: Selvarasu Ganesan selvarasu.g@samsung.com Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com
Changes in v3:
- Added Co-developed-by tags to reflect the correct authorship.
- And Added Acked-by tag as well.
Link to v2: https://lore.kernel.org/all/20250807014639.1596-1-selvarasu.g@samsung.com/
Changes in v2:
- Removed the 'Fixes' tag from the commit message, as this patch does not contain a fix.
- And Retained the 'stable' tag, as these changes are intended to be applied across all stable kernels.
- Additionally, replaced 'dev_warn*' with 'dev_err*'."
Link to v1: https://lore.kernel.org/all/20250807005638.thhsgjn73aaov2af@synopsys.com/
drivers/usb/dwc3/ep0.c | 20 ++++++++++++++++---- drivers/usb/dwc3/gadget.c | 10 ++++++++-- 2 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c index 666ac432f52d..b4229aa13f37 100644 --- a/drivers/usb/dwc3/ep0.c +++ b/drivers/usb/dwc3/ep0.c @@ -288,7 +288,9 @@ void dwc3_ep0_out_start(struct dwc3 *dwc) dwc3_ep0_prepare_one_trb(dep, dwc->ep0_trb_addr, 8, DWC3_TRBCTL_CONTROL_SETUP, false); ret = dwc3_ep0_start_trans(dep);
- WARN_ON(ret < 0);
- if (ret < 0)
dev_err(dwc->dev, "ep0 out start transfer failed: %d\n", ret);
If this fails, why aren't you returning the error and handling it properly? Just throwing an error message feels like it's not going to do much overall.
for (i = 2; i < DWC3_ENDPOINTS_NUM; i++) { struct dwc3_ep *dwc3_ep; @@ -1061,7 +1063,9 @@ static void __dwc3_ep0_do_control_data(struct dwc3 *dwc, ret = dwc3_ep0_start_trans(dep); }
- WARN_ON(ret < 0);
- if (ret < 0)
dev_err(dwc->dev,
"ep0 data phase start transfer failed: %d\n", ret);
Same here, why not return the error and propagate it up the call stack?
} static int dwc3_ep0_start_control_status(struct dwc3_ep *dep) @@ -1078,7 +1082,12 @@ static int dwc3_ep0_start_control_status(struct dwc3_ep *dep) static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct dwc3_ep *dep) {
- WARN_ON(dwc3_ep0_start_control_status(dep));
- int ret;
- ret = dwc3_ep0_start_control_status(dep);
- if (ret)
dev_err(dwc->dev,
"ep0 status phase start transfer failed: %d\n", ret);
Same here. Don't "swallow" errors that you find, that's a sure way to paper over real problems.
Same for all other changes here.
thanks,
greg k-h