The below “No resource for ep” warning appears when a StartTransfer command is issued for bulk or interrupt endpoints in `dwc3_gadget_ep_enable` while a previous StartTransfer on the same endpoint is still in progress. The gadget functions drivers can invoke `usb_ep_enable` (which triggers a new StartTransfer command) before the earlier transfer has completed. Because the previous StartTransfer is still active, `dwc3_gadget_ep_disable` can skip the required `EndTransfer` due to `DWC3_EP_DELAY_STOP`, leading to the endpoint resources are busy for previous StartTransfer and warning ("No resource for ep") from dwc3 driver.
To resolve this, a check is added to `dwc3_gadget_ep_enable` that checks the `DWC3_EP_TRANSFER_STARTED` flag before issuing a new StartTransfer. By preventing a second StartTransfer on an already busy endpoint, the resource conflict is eliminated, the warning disappears, and potential kernel panics caused by `panic_on_warn` are avoided.
------------[ cut here ]------------ dwc3 13200000.dwc3: No resource for ep1out WARNING: CPU: 0 PID: 700 at drivers/usb/dwc3/gadget.c:398 dwc3_send_gadget_ep_cmd+0x2f8/0x76c Call trace: dwc3_send_gadget_ep_cmd+0x2f8/0x76c __dwc3_gadget_ep_enable+0x490/0x7c0 dwc3_gadget_ep_enable+0x6c/0xe4 usb_ep_enable+0x5c/0x15c mp_eth_stop+0xd4/0x11c __dev_close_many+0x160/0x1c8 __dev_change_flags+0xfc/0x220 dev_change_flags+0x24/0x70 devinet_ioctl+0x434/0x524 inet_ioctl+0xa8/0x224 sock_do_ioctl+0x74/0x128 sock_ioctl+0x3bc/0x468 __arm64_sys_ioctl+0xa8/0xe4 invoke_syscall+0x58/0x10c el0_svc_common+0xa8/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x38/0x88 el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac
Fixes: a97ea994605e ("usb: dwc3: gadget: offset Start Transfer latency for bulk EPs") Cc: stable@vger.kernel.org Signed-off-by: Selvarasu Ganesan selvarasu.g@samsung.com ---
Changes in v2: - Removed change-id. - Updated commit message. Link to v1: https://lore.kernel.org/linux-usb/20251117152812.622-1-selvarasu.g@samsung.c... --- drivers/usb/dwc3/gadget.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 1f67fb6aead5..8d3caa71ea12 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -963,8 +963,9 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) * Issue StartTransfer here with no-op TRB so we can always rely on No * Response Update Transfer command. */ - if (usb_endpoint_xfer_bulk(desc) || - usb_endpoint_xfer_int(desc)) { + if ((usb_endpoint_xfer_bulk(desc) || + usb_endpoint_xfer_int(desc)) && + !(dep->flags & DWC3_EP_TRANSFER_STARTED)) { struct dwc3_gadget_ep_cmd_params params; struct dwc3_trb *trb; dma_addr_t trb_dma;
On Mon, Nov 17, 2025, Selvarasu Ganesan wrote:
The below “No resource for ep” warning appears when a StartTransfer command is issued for bulk or interrupt endpoints in `dwc3_gadget_ep_enable` while a previous StartTransfer on the same endpoint is still in progress. The gadget functions drivers can invoke `usb_ep_enable` (which triggers a new StartTransfer command) before the earlier transfer has completed. Because the previous StartTransfer is still active, `dwc3_gadget_ep_disable` can skip the required `EndTransfer` due to `DWC3_EP_DELAY_STOP`, leading to the endpoint resources are busy for previous StartTransfer and warning ("No resource for ep") from dwc3 driver.
To resolve this, a check is added to `dwc3_gadget_ep_enable` that checks the `DWC3_EP_TRANSFER_STARTED` flag before issuing a new StartTransfer. By preventing a second StartTransfer on an already busy endpoint, the resource conflict is eliminated, the warning disappears, and potential kernel panics caused by `panic_on_warn` are avoided.
------------[ cut here ]------------ dwc3 13200000.dwc3: No resource for ep1out WARNING: CPU: 0 PID: 700 at drivers/usb/dwc3/gadget.c:398 dwc3_send_gadget_ep_cmd+0x2f8/0x76c Call trace: dwc3_send_gadget_ep_cmd+0x2f8/0x76c __dwc3_gadget_ep_enable+0x490/0x7c0 dwc3_gadget_ep_enable+0x6c/0xe4 usb_ep_enable+0x5c/0x15c mp_eth_stop+0xd4/0x11c __dev_close_many+0x160/0x1c8 __dev_change_flags+0xfc/0x220 dev_change_flags+0x24/0x70 devinet_ioctl+0x434/0x524 inet_ioctl+0xa8/0x224 sock_do_ioctl+0x74/0x128 sock_ioctl+0x3bc/0x468 __arm64_sys_ioctl+0xa8/0xe4 invoke_syscall+0x58/0x10c el0_svc_common+0xa8/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x38/0x88 el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac
Fixes: a97ea994605e ("usb: dwc3: gadget: offset Start Transfer latency for bulk EPs") Cc: stable@vger.kernel.org Signed-off-by: Selvarasu Ganesan selvarasu.g@samsung.com
Changes in v2:
- Removed change-id.
- Updated commit message.
Link to v1: https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20251117152812...
drivers/usb/dwc3/gadget.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 1f67fb6aead5..8d3caa71ea12 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -963,8 +963,9 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) * Issue StartTransfer here with no-op TRB so we can always rely on No * Response Update Transfer command. */
- if (usb_endpoint_xfer_bulk(desc) ||
usb_endpoint_xfer_int(desc)) {
- if ((usb_endpoint_xfer_bulk(desc) ||
usb_endpoint_xfer_int(desc)) && struct dwc3_gadget_ep_cmd_params params; struct dwc3_trb *trb; dma_addr_t trb_dma;!(dep->flags & DWC3_EP_TRANSFER_STARTED)) {-- 2.34.1
Thanks for the catch. The problem is that the "ep_disable" process should be completed after usb_ep_disable is completed. But currently it may be an async call.
This brings up some conflicting wording of the gadget API regarding usb_ep_disable. Here's the doc regarding usb_ep_disable:
/** * usb_ep_disable - endpoint is no longer usable * @ep:the endpoint being unconfigured. may not be the endpoint named "ep0". * * no other task may be using this endpoint when this is called. * any pending and uncompleted requests will complete with status * indicating disconnect (-ESHUTDOWN) before this call returns. * gadget drivers must call usb_ep_enable() again before queueing * requests to the endpoint. * * This routine may be called in an atomic (interrupt) context. * * returns zero, or a negative error code. */
It expects all requests to be completed and given back on completion. It also notes that this can also be called in interrupt context. Currently, there's a scenario where dwc3 may not want to give back the requests right away (ie. DWC3_EP_DELAY_STOP). To fix that in dwc3, it would need to "wait" for the right condition. But waiting does not make sense in interrupt context. (We could busy-poll to satisfy the interrupt context, but that doesn't sound right either)
This was updated from process context only to may be called in interrupt context:
b0d5d2a71641 ("usb: gadget: udc: core: Revise comments for USB ep enable/disable")
Hi Alan,
Can you help give your opinion on this?
Thanks, Thinh
On 11/18/2025 7:51 AM, Thinh Nguyen wrote:
On Mon, Nov 17, 2025, Selvarasu Ganesan wrote:
The below “No resource for ep” warning appears when a StartTransfer command is issued for bulk or interrupt endpoints in `dwc3_gadget_ep_enable` while a previous StartTransfer on the same endpoint is still in progress. The gadget functions drivers can invoke `usb_ep_enable` (which triggers a new StartTransfer command) before the earlier transfer has completed. Because the previous StartTransfer is still active, `dwc3_gadget_ep_disable` can skip the required `EndTransfer` due to `DWC3_EP_DELAY_STOP`, leading to the endpoint resources are busy for previous StartTransfer and warning ("No resource for ep") from dwc3 driver.
To resolve this, a check is added to `dwc3_gadget_ep_enable` that checks the `DWC3_EP_TRANSFER_STARTED` flag before issuing a new StartTransfer. By preventing a second StartTransfer on an already busy endpoint, the resource conflict is eliminated, the warning disappears, and potential kernel panics caused by `panic_on_warn` are avoided.
------------[ cut here ]------------ dwc3 13200000.dwc3: No resource for ep1out WARNING: CPU: 0 PID: 700 at drivers/usb/dwc3/gadget.c:398 dwc3_send_gadget_ep_cmd+0x2f8/0x76c Call trace: dwc3_send_gadget_ep_cmd+0x2f8/0x76c __dwc3_gadget_ep_enable+0x490/0x7c0 dwc3_gadget_ep_enable+0x6c/0xe4 usb_ep_enable+0x5c/0x15c mp_eth_stop+0xd4/0x11c __dev_close_many+0x160/0x1c8 __dev_change_flags+0xfc/0x220 dev_change_flags+0x24/0x70 devinet_ioctl+0x434/0x524 inet_ioctl+0xa8/0x224 sock_do_ioctl+0x74/0x128 sock_ioctl+0x3bc/0x468 __arm64_sys_ioctl+0xa8/0xe4 invoke_syscall+0x58/0x10c el0_svc_common+0xa8/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x38/0x88 el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac
Fixes: a97ea994605e ("usb: dwc3: gadget: offset Start Transfer latency for bulk EPs") Cc: stable@vger.kernel.org Signed-off-by: Selvarasu Ganesan selvarasu.g@samsung.com
Changes in v2:
- Removed change-id.
- Updated commit message.
Link to v1: https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20251117152812...
drivers/usb/dwc3/gadget.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 1f67fb6aead5..8d3caa71ea12 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -963,8 +963,9 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) * Issue StartTransfer here with no-op TRB so we can always rely on No * Response Update Transfer command. */
- if (usb_endpoint_xfer_bulk(desc) ||
usb_endpoint_xfer_int(desc)) {
- if ((usb_endpoint_xfer_bulk(desc) ||
usb_endpoint_xfer_int(desc)) && struct dwc3_gadget_ep_cmd_params params; struct dwc3_trb *trb; dma_addr_t trb_dma;!(dep->flags & DWC3_EP_TRANSFER_STARTED)) {-- 2.34.1
Thanks for the catch. The problem is that the "ep_disable" process should be completed after usb_ep_disable is completed. But currently it may be an async call.
This brings up some conflicting wording of the gadget API regarding usb_ep_disable. Here's the doc regarding usb_ep_disable:
/** * usb_ep_disable - endpoint is no longer usable * @ep:the endpoint being unconfigured. may not be the endpoint named "ep0". * * no other task may be using this endpoint when this is called. * any pending and uncompleted requests will complete with status * indicating disconnect (-ESHUTDOWN) before this call returns. * gadget drivers must call usb_ep_enable() again before queueing * requests to the endpoint. * * This routine may be called in an atomic (interrupt) context. * * returns zero, or a negative error code. */
It expects all requests to be completed and given back on completion. It also notes that this can also be called in interrupt context. Currently, there's a scenario where dwc3 may not want to give back the requests right away (ie. DWC3_EP_DELAY_STOP). To fix that in dwc3, it would need
Hi Thinh, Thanks for your comments. I agree with you on dwc3 may not want to give back the requests right away (ie. DWC3_EP_DELAY_STOP) and might also ignore the End Transfer command. Consequently, there’s no point in scheduling a new Start Transfer before the previous one has completed. The traces below illustrate this: for EP1OUT the End Transfer never occurs, so a new Start Transfer is issued, which eventually ends with a “No Resource” error.
1. EP disable for ep1in: (working EP) ----------------------------------
dwc3_gadget_ep_disable: ep1in: mps 512/1024 streams 16 burst 0 ring 8/0 flags E:swBp:< dwc3_gadget_ep_cmd: ep1in: cmd 'End Transfer' [30c08] params 00000000 00000000 00000000 --> status: Successful dwc3_gadget_giveback: ep1in: req 00000000cfc03ba0 length 0/194 Zsi ==> -108 dwc3_gadget_giveback: ep1in: req 0000000075196149 length 0/194 Zsi ==> -108 dwc3_gadget_giveback: ep1in: req 000000002b2ffaa2 length 0/86 Zsi ==> -108 2. EP enable for ep1out:(Not working EP) -->(No End Transfer) ----------------------------------------------------------
dwc3_gadget_ep_disable: ep1out: mps 512/682 streams 16 burst 0 ring 114/74 flags E:swBp:> 3. EP disable for ep1in: -----------------------
dwc3_gadget_ep_cmd: ep1in: cmd 'Set Endpoint Configuration' [401] params 00021004 06000200 00000000 --> status: Successful dwc3_gadget_ep_cmd: ep1in: cmd 'Start Transfer' [406] params 00000008 ad6b9000 00000000 --> status: Successful dwc3_gadget_ep_enable: ep1in: mps 512/1024 streams 16 burst 0 ring 0/0 flags E:swBp:< 4. EP enable for ep1out:(Triggered start Transfer without End Transfer) ----------------------------------------------------------------------
dwc3_gadget_ep_cmd: ep1out: cmd 'Set Endpoint Configuration' [401] params 00001004 04000200 00000000 --> status: Successful dwc3_gadget_ep_cmd: ep1out: cmd 'Start Transfer' [406] params 00000008 ad6b7000 00000000 --> status: No Resource dwc3_gadget_ep_enable: ep1out: mps 512/682 streams 16 burst 0 ring 114/74 flags E:swBp:>
The given fix will prevent in this failure case ,
dwc3_gadget_ep_enable: ---------------------
+ if ((usb_endpoint_xfer_bulk(desc) || + usb_endpoint_xfer_int(desc)) && + !(dep->flags & DWC3_EP_TRANSFER_STARTED)) {
Thanks, Selva
to "wait" for the right condition. But waiting does not make sense in interrupt context. (We could busy-poll to satisfy the interrupt context, but that doesn't sound right either)
This was updated from process context only to may be called in interrupt context:
b0d5d2a71641 ("usb: gadget: udc: core: Revise comments for USB ep enable/disable")
Hi Alan,
Can you help give your opinion on this?
Thanks, Thinh
On Tue, Nov 18, 2025, Selvarasu Ganesan wrote:
On 11/18/2025 7:51 AM, Thinh Nguyen wrote:
On Mon, Nov 17, 2025, Selvarasu Ganesan wrote:
The below “No resource for ep” warning appears when a StartTransfer command is issued for bulk or interrupt endpoints in `dwc3_gadget_ep_enable` while a previous StartTransfer on the same endpoint is still in progress. The gadget functions drivers can invoke `usb_ep_enable` (which triggers a new StartTransfer command) before the earlier transfer has completed. Because the previous StartTransfer is still active, `dwc3_gadget_ep_disable` can skip the required `EndTransfer` due to `DWC3_EP_DELAY_STOP`, leading to the endpoint resources are busy for previous StartTransfer and warning ("No resource for ep") from dwc3 driver.
To resolve this, a check is added to `dwc3_gadget_ep_enable` that checks the `DWC3_EP_TRANSFER_STARTED` flag before issuing a new StartTransfer. By preventing a second StartTransfer on an already busy endpoint, the resource conflict is eliminated, the warning disappears, and potential kernel panics caused by `panic_on_warn` are avoided.
------------[ cut here ]------------ dwc3 13200000.dwc3: No resource for ep1out WARNING: CPU: 0 PID: 700 at drivers/usb/dwc3/gadget.c:398 dwc3_send_gadget_ep_cmd+0x2f8/0x76c Call trace: dwc3_send_gadget_ep_cmd+0x2f8/0x76c __dwc3_gadget_ep_enable+0x490/0x7c0 dwc3_gadget_ep_enable+0x6c/0xe4 usb_ep_enable+0x5c/0x15c mp_eth_stop+0xd4/0x11c __dev_close_many+0x160/0x1c8 __dev_change_flags+0xfc/0x220 dev_change_flags+0x24/0x70 devinet_ioctl+0x434/0x524 inet_ioctl+0xa8/0x224 sock_do_ioctl+0x74/0x128 sock_ioctl+0x3bc/0x468 __arm64_sys_ioctl+0xa8/0xe4 invoke_syscall+0x58/0x10c el0_svc_common+0xa8/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x38/0x88 el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac
Fixes: a97ea994605e ("usb: dwc3: gadget: offset Start Transfer latency for bulk EPs") Cc: stable@vger.kernel.org Signed-off-by: Selvarasu Ganesan selvarasu.g@samsung.com
Changes in v2:
- Removed change-id.
- Updated commit message.
Link to v1: https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20251117152812...
drivers/usb/dwc3/gadget.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 1f67fb6aead5..8d3caa71ea12 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -963,8 +963,9 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) * Issue StartTransfer here with no-op TRB so we can always rely on No * Response Update Transfer command. */
- if (usb_endpoint_xfer_bulk(desc) ||
usb_endpoint_xfer_int(desc)) {
- if ((usb_endpoint_xfer_bulk(desc) ||
usb_endpoint_xfer_int(desc)) && struct dwc3_gadget_ep_cmd_params params; struct dwc3_trb *trb; dma_addr_t trb_dma;!(dep->flags & DWC3_EP_TRANSFER_STARTED)) {-- 2.34.1
Thanks for the catch. The problem is that the "ep_disable" process should be completed after usb_ep_disable is completed. But currently it may be an async call.
This brings up some conflicting wording of the gadget API regarding usb_ep_disable. Here's the doc regarding usb_ep_disable:
/** * usb_ep_disable - endpoint is no longer usable * @ep:the endpoint being unconfigured. may not be the endpoint named "ep0". * * no other task may be using this endpoint when this is called. * any pending and uncompleted requests will complete with status * indicating disconnect (-ESHUTDOWN) before this call returns. * gadget drivers must call usb_ep_enable() again before queueing * requests to the endpoint. * * This routine may be called in an atomic (interrupt) context. * * returns zero, or a negative error code. */
It expects all requests to be completed and given back on completion. It also notes that this can also be called in interrupt context. Currently, there's a scenario where dwc3 may not want to give back the requests right away (ie. DWC3_EP_DELAY_STOP). To fix that in dwc3, it would need
Hi Thinh, Thanks for your comments. I agree with you on dwc3 may not want to give back the requests right away (ie. DWC3_EP_DELAY_STOP) and might also ignore the End Transfer command. Consequently, there’s no point in scheduling a new Start Transfer before the previous one has completed. The traces below illustrate this: for EP1OUT the End Transfer never occurs, so a new Start Transfer is issued, which eventually ends with a “No Resource” error.
Hi,
In your point, we may as well remove the Start Transfer for all cases. It actually doesn't impact performance, or required, or needed for No Response Update Transfer command flow.
The reason we still have that is actually for bulk streams. Your change actually doesn't fix for the case of bulk streams. (Needed to start and stop the endpoint to bring it to a certain state as documented in the same function).
The problem I'm bringing up is related to the dwc3 not really following the usb_ep_disable() doc in 1 scenario. Depending on how we want to fix that, then we may not need to change here.
BR, Thinh
On Tue, Nov 18, 2025 at 02:21:17AM +0000, Thinh Nguyen wrote:
Thanks for the catch. The problem is that the "ep_disable" process should be completed after usb_ep_disable is completed. But currently it may be an async call.
This brings up some conflicting wording of the gadget API regarding usb_ep_disable. Here's the doc regarding usb_ep_disable:
/** * usb_ep_disable - endpoint is no longer usable * @ep:the endpoint being unconfigured. may not be the endpoint named "ep0". * * no other task may be using this endpoint when this is called. * any pending and uncompleted requests will complete with status * indicating disconnect (-ESHUTDOWN) before this call returns. * gadget drivers must call usb_ep_enable() again before queueing * requests to the endpoint. * * This routine may be called in an atomic (interrupt) context. * * returns zero, or a negative error code. */
It expects all requests to be completed and given back on completion. It also notes that this can also be called in interrupt context. Currently, there's a scenario where dwc3 may not want to give back the requests right away (ie. DWC3_EP_DELAY_STOP). To fix that in dwc3, it would need to "wait" for the right condition. But waiting does not make sense in interrupt context. (We could busy-poll to satisfy the interrupt context, but that doesn't sound right either)
This was updated from process context only to may be called in interrupt context:
b0d5d2a71641 ("usb: gadget: udc: core: Revise comments for USB ep enable/disable")
Hi Alan,
Can you help give your opinion on this?
Well, I think the change to the API was made because drivers _were_ calling these routines in interrupt context. That's what the commit's description says, anyway.
One way out of the problem would be to change the kerneldoc for usb_ep_disable(). Instead of saying that pending requests will complete before the all returns, say that the the requests will be marked for cancellation (with -ESHUTDOWN) before the call returns, but the actual completions might happen asynchronously later on.
The difficulty comes when a gadget driver has to handle a Set-Interface request, or Set-Config for the same configuration. The endpoints for the old altsetting/config have to be disabled and then the endpoints for the new altsetting/config have to be enabled, all while managing any pending requests. I don't know how various function drivers handle this, just that f_mass_storage is very careful about taking care of everything in a separate kernel thread that explicitly dequeues the pending requests and flushes the endpoints. In fact, this scenario was the whole reason for inventing the DELAYED_STATUS mechanism, because it was impossible to do all the necessary work within the callback routine for a control-request interrupt handler.
Alan Stern
On Mon, Nov 17, 2025, Alan Stern wrote:
On Tue, Nov 18, 2025 at 02:21:17AM +0000, Thinh Nguyen wrote:
Thanks for the catch. The problem is that the "ep_disable" process should be completed after usb_ep_disable is completed. But currently it may be an async call.
This brings up some conflicting wording of the gadget API regarding usb_ep_disable. Here's the doc regarding usb_ep_disable:
/** * usb_ep_disable - endpoint is no longer usable * @ep:the endpoint being unconfigured. may not be the endpoint named "ep0". * * no other task may be using this endpoint when this is called. * any pending and uncompleted requests will complete with status * indicating disconnect (-ESHUTDOWN) before this call returns. * gadget drivers must call usb_ep_enable() again before queueing * requests to the endpoint. * * This routine may be called in an atomic (interrupt) context. * * returns zero, or a negative error code. */
It expects all requests to be completed and given back on completion. It also notes that this can also be called in interrupt context. Currently, there's a scenario where dwc3 may not want to give back the requests right away (ie. DWC3_EP_DELAY_STOP). To fix that in dwc3, it would need to "wait" for the right condition. But waiting does not make sense in interrupt context. (We could busy-poll to satisfy the interrupt context, but that doesn't sound right either)
This was updated from process context only to may be called in interrupt context:
b0d5d2a71641 ("usb: gadget: udc: core: Revise comments for USB ep enable/disable")
Hi Alan,
Can you help give your opinion on this?
Well, I think the change to the API was made because drivers _were_ calling these routines in interrupt context. That's what the commit's description says, anyway.
One way out of the problem would be to change the kerneldoc for usb_ep_disable(). Instead of saying that pending requests will complete before the all returns, say that the the requests will be marked for cancellation (with -ESHUTDOWN) before the call returns, but the actual completions might happen asynchronously later on.
The burden of synchronization would be shifted to the gadget drivers. The problem with this is that gadget drivers may modify the requests after usb_ep_disable() when it should not (e.g. the controller may still be processing the buffer). Also, gadget drivers shouldn't call usb_ep_enabled() until the requests are returned.
The difficulty comes when a gadget driver has to handle a Set-Interface request, or Set-Config for the same configuration. The endpoints for the old altsetting/config have to be disabled and then the endpoints for the new altsetting/config have to be enabled, all while managing any
Right.
pending requests. I don't know how various function drivers handle this, just that f_mass_storage is very careful about taking care of everything in a separate kernel thread that explicitly dequeues the pending requests and flushes the endpoints. In fact, this scenario was the whole reason for inventing the DELAYED_STATUS mechanism, because it was impossible to do all the necessary work within the callback routine for a control-request interrupt handler.
If we want to keep usb_ep_disable in interrupt context, should we revise the wording such that gadget drivers must ensure pending requests are dequeued before calling usb_ep_disable()? That requests are expected to be given back before usb_ep_disable().
Or perhaps revert the commit above (commit b0d5d2a71641), fix the dwc3 driver for that, and gadget drivers need to follow the original requirement.
BR, Thinh
On Wed, Nov 19, 2025 at 01:49:12AM +0000, Thinh Nguyen wrote:
On Mon, Nov 17, 2025, Alan Stern wrote:
Hi Alan,
Can you help give your opinion on this?
Well, I think the change to the API was made because drivers _were_ calling these routines in interrupt context. That's what the commit's description says, anyway.
One way out of the problem would be to change the kerneldoc for usb_ep_disable(). Instead of saying that pending requests will complete before the all returns, say that the the requests will be marked for cancellation (with -ESHUTDOWN) before the call returns, but the actual completions might happen asynchronously later on.
The burden of synchronization would be shifted to the gadget drivers. The problem with this is that gadget drivers may modify the requests after usb_ep_disable() when it should not (e.g. the controller may still be processing the buffer). Also, gadget drivers shouldn't call usb_ep_enabled() until the requests are returned.
No, they probably shouldn't, although I don't know if that would actually cause any trouble. It's not a good idea, in any case -- particularly if the drivers want to re-use the same requests as before.
The problem is that function drivers' ->set_alt() callbacks are expected to do two things: disable all the endpoints from the old altsetting and enable all the endpoints in the new altsetting. There's no way for any part of the gadget core to intervene between those things (for instance, to wait for requests to complete).
The difficulty comes when a gadget driver has to handle a Set-Interface request, or Set-Config for the same configuration. The endpoints for the old altsetting/config have to be disabled and then the endpoints for the new altsetting/config have to be enabled, all while managing any
Right.
pending requests. I don't know how various function drivers handle this, just that f_mass_storage is very careful about taking care of everything in a separate kernel thread that explicitly dequeues the pending requests and flushes the endpoints. In fact, this scenario was the whole reason for inventing the DELAYED_STATUS mechanism, because it was impossible to do all the necessary work within the callback routine for a control-request interrupt handler.
If we want to keep usb_ep_disable in interrupt context, should we revise the wording such that gadget drivers must ensure pending requests are dequeued before calling usb_ep_disable()? That requests are expected to be given back before usb_ep_disable().
Or perhaps revert the commit above (commit b0d5d2a71641), fix the dwc3 driver for that, and gadget drivers need to follow the original requirement.
Function drivers would have to go to great lengths to guarantee that requests had completed before the endpoint is re-enabled. Right now their ->set_alt() callback routines are designed to run in interrupt context; they can't afford to wait for requests to complete.
The easiest way out is for usb_ep_disable() to do what the kerneldoc says: ensure that pending requests do complete before it returns. Can dwc3 do this? (And what if at some time in the future we want to start using an asynchronous bottom half for request completions, like usbcore does for URBs?)
Let's face it; the situation is a mess.
Alan Stern
On Tue, Nov 18, 2025, Alan Stern wrote:
On Wed, Nov 19, 2025 at 01:49:12AM +0000, Thinh Nguyen wrote:
On Mon, Nov 17, 2025, Alan Stern wrote:
Hi Alan,
Can you help give your opinion on this?
Well, I think the change to the API was made because drivers _were_ calling these routines in interrupt context. That's what the commit's description says, anyway.
One way out of the problem would be to change the kerneldoc for usb_ep_disable(). Instead of saying that pending requests will complete before the all returns, say that the the requests will be marked for cancellation (with -ESHUTDOWN) before the call returns, but the actual completions might happen asynchronously later on.
The burden of synchronization would be shifted to the gadget drivers. The problem with this is that gadget drivers may modify the requests after usb_ep_disable() when it should not (e.g. the controller may still be processing the buffer). Also, gadget drivers shouldn't call usb_ep_enabled() until the requests are returned.
No, they probably shouldn't, although I don't know if that would actually cause any trouble. It's not a good idea, in any case -- particularly if the drivers want to re-use the same requests as before.
Right.
The problem is that function drivers' ->set_alt() callbacks are expected to do two things: disable all the endpoints from the old altsetting and enable all the endpoints in the new altsetting. There's no way for any part of the gadget core to intervene between those things (for instance, to wait for requests to complete).
The difficulty comes when a gadget driver has to handle a Set-Interface request, or Set-Config for the same configuration. The endpoints for the old altsetting/config have to be disabled and then the endpoints for the new altsetting/config have to be enabled, all while managing any
Right.
pending requests. I don't know how various function drivers handle this, just that f_mass_storage is very careful about taking care of everything in a separate kernel thread that explicitly dequeues the pending requests and flushes the endpoints. In fact, this scenario was the whole reason for inventing the DELAYED_STATUS mechanism, because it was impossible to do all the necessary work within the callback routine for a control-request interrupt handler.
If we want to keep usb_ep_disable in interrupt context, should we revise the wording such that gadget drivers must ensure pending requests are dequeued before calling usb_ep_disable()? That requests are expected to be given back before usb_ep_disable().
Or perhaps revert the commit above (commit b0d5d2a71641), fix the dwc3 driver for that, and gadget drivers need to follow the original requirement.
Function drivers would have to go to great lengths to guarantee that requests had completed before the endpoint is re-enabled. Right now their ->set_alt() callback routines are designed to run in interrupt context; they can't afford to wait for requests to complete.
Why is ->set_alt() designed for interrupt context? We can't expect requests to be completed before usb_ep_disable() completes _and_ also expect usb_ep_disable() be able to be called in interrupt context.
The easiest way out is for usb_ep_disable() to do what the kerneldoc says: ensure that pending requests do complete before it returns. Can dwc3 do this? (And what if at some time in the future we want to start
The dwc3 can do that, but we need to note that usb_ep_disable() must be executed in process context and might sleep. I suspect we may run into some issues from some function drivers that expected usb_ep_disable() to be executable in interrupt context.
using an asynchronous bottom half for request completions, like usbcore does for URBs?)
Which one are you referring to? From what I see, even the host side expected ->endpoint_disable to be executed in process context.
Perhaps we can introduce endpoint_flush() on gadget side for synchronization if we want to keep usb_ep_disable() to be asynchronous?
Let's face it; the situation is a mess.
Glad you're here to help with the mess :)
Thanks, Thinh
On Thu, Nov 20, 2025 at 02:07:33AM +0000, Thinh Nguyen wrote:
Function drivers would have to go to great lengths to guarantee that requests had completed before the endpoint is re-enabled. Right now their ->set_alt() callback routines are designed to run in interrupt context; they can't afford to wait for requests to complete.
Why is ->set_alt() designed for interrupt context? We can't expect requests to be completed before usb_ep_disable() completes _and_ also expect usb_ep_disable() be able to be called in interrupt context.
->set_alt() is called by the composite core when a Set-Interface or Set-Config control request arrives from the host. It happens within the composite_setup() handler, which is called by the UDC driver when a control request arrives, which means it happens in the context of the UDC driver's interrupt handler. Therefore ->set_alt() callbacks must not sleep.
The easiest way out is for usb_ep_disable() to do what the kerneldoc says: ensure that pending requests do complete before it returns. Can dwc3 do this? (And what if at some time in the future we want to start
The dwc3 can do that, but we need to note that usb_ep_disable() must be executed in process context and might sleep. I suspect we may run into some issues from some function drivers that expected usb_ep_disable() to be executable in interrupt context.
Well, that's part of what I meant to ask. Is it possible to wait for all pending requests to be given back while in interrupt context?
using an asynchronous bottom half for request completions, like usbcore does for URBs?)
Which one are you referring to? From what I see, even the host side expected ->endpoint_disable to be executed in process context.
I was referring to the way usb_hcd_giveback_urb() uses system_bh_wq or system_bh_highpri_wq to do its work. This makes it impossible for an interrupt handler to wait for a giveback to complete.
If the gadget core also switches over to using a work queue for request completions, it will then likewise become impossible for an interrupt handler to wait for a request to complete.
Perhaps we can introduce endpoint_flush() on gadget side for synchronization if we want to keep usb_ep_disable() to be asynchronous?
Let's face it; the situation is a mess.
Glad you're here to help with the mess :)
To do this right, I can't think of any approach other than to make the composite core use a work queue or other kernel thread for handling Set-Interface and Set-Config calls.
It would be nice if there was a way to invoke the ->set_alt() callback that would just disable the interface's endpoints without re-enabling anything. Then the composite core could disable the existing altsetting, flush the old endpoints, and call ->set_alt() a second time to install the new altsetting and enable the new endpoints. But implementing this would require us to update every function driver's ->set_alt() callback routine.
Without that ability, we will have to audit every function driver to make sure the ->set_alt() callbacks do ensure that endpoints are flushed before they are re-enabled.
There does not seem to be any way to fix the problem just by changing the gadget core.
Alan Stern
On Wed, Nov 19, 2025, Alan Stern wrote:
On Thu, Nov 20, 2025 at 02:07:33AM +0000, Thinh Nguyen wrote:
Function drivers would have to go to great lengths to guarantee that requests had completed before the endpoint is re-enabled. Right now their ->set_alt() callback routines are designed to run in interrupt context; they can't afford to wait for requests to complete.
Why is ->set_alt() designed for interrupt context? We can't expect requests to be completed before usb_ep_disable() completes _and_ also expect usb_ep_disable() be able to be called in interrupt context.
->set_alt() is called by the composite core when a Set-Interface or Set-Config control request arrives from the host. It happens within the composite_setup() handler, which is called by the UDC driver when a control request arrives, which means it happens in the context of the UDC driver's interrupt handler. Therefore ->set_alt() callbacks must not sleep.
This should be changed. I don't think we can expect set_alt() to be in interrupt context only.
The easiest way out is for usb_ep_disable() to do what the kerneldoc says: ensure that pending requests do complete before it returns. Can dwc3 do this? (And what if at some time in the future we want to start
The dwc3 can do that, but we need to note that usb_ep_disable() must be executed in process context and might sleep. I suspect we may run into some issues from some function drivers that expected usb_ep_disable() to be executable in interrupt context.
Well, that's part of what I meant to ask. Is it possible to wait for all pending requests to be given back while in interrupt context?
The dwc3 controller will need some time (usually less than 2ms) to flush the endpoints and give back requests. It's probably too long to have busy poll in interrupt context.
using an asynchronous bottom half for request completions, like usbcore does for URBs?)
Which one are you referring to? From what I see, even the host side expected ->endpoint_disable to be executed in process context.
I was referring to the way usb_hcd_giveback_urb() uses system_bh_wq or system_bh_highpri_wq to do its work. This makes it impossible for an interrupt handler to wait for a giveback to complete.
If the gadget core also switches over to using a work queue for request completions, it will then likewise become impossible for an interrupt handler to wait for a request to complete.
Perhaps we can introduce endpoint_flush() on gadget side for synchronization if we want to keep usb_ep_disable() to be asynchronous?
Let's face it; the situation is a mess.
Glad you're here to help with the mess :)
To do this right, I can't think of any approach other than to make the composite core use a work queue or other kernel thread for handling Set-Interface and Set-Config calls.
Sounds like it should've been like this initially.
It would be nice if there was a way to invoke the ->set_alt() callback that would just disable the interface's endpoints without re-enabling anything. Then the composite core could disable the existing altsetting, flush the old endpoints, and call ->set_alt() a second time to install the new altsetting and enable the new endpoints. But implementing this would require us to update every function driver's ->set_alt() callback routine.
Yeah..
Without that ability, we will have to audit every function driver to make sure the ->set_alt() callbacks do ensure that endpoints are flushed before they are re-enabled.
There does not seem to be any way to fix the problem just by changing the gadget core.
We can have a workaround in dwc3 that can temporarily "work" with what we have. However, eventually, we will need to properly rework this and audit the gadget drivers.
Thanks, Thinh
On Fri, Nov 21, 2025 at 02:22:02AM +0000, Thinh Nguyen wrote:
On Wed, Nov 19, 2025, Alan Stern wrote:
->set_alt() is called by the composite core when a Set-Interface or Set-Config control request arrives from the host. It happens within the composite_setup() handler, which is called by the UDC driver when a control request arrives, which means it happens in the context of the UDC driver's interrupt handler. Therefore ->set_alt() callbacks must not sleep.
This should be changed. I don't think we can expect set_alt() to be in interrupt context only.
Agreed.
To do this right, I can't think of any approach other than to make the composite core use a work queue or other kernel thread for handling Set-Interface and Set-Config calls.
Sounds like it should've been like this initially.
I guess the nobody thought through the issues very carefully at the time the composite framework was designed. Maybe the UDCs that existed back did not require a lot of time to flush endpoints; I can't remember.
Without that ability, we will have to audit every function driver to make sure the ->set_alt() callbacks do ensure that endpoints are flushed before they are re-enabled.
There does not seem to be any way to fix the problem just by changing the gadget core.
We can have a workaround in dwc3 that can temporarily "work" with what we have. However, eventually, we will need to properly rework this and audit the gadget drivers.
Clearly, the first step is to change the composite core. That can be done without messing up anything else. But yes, eventually the gadget drivers will have to be audited.
Alan Stern
On 11/21/2025 8:38 AM, Alan Stern wrote:
On Fri, Nov 21, 2025 at 02:22:02AM +0000, Thinh Nguyen wrote:
On Wed, Nov 19, 2025, Alan Stern wrote:
->set_alt() is called by the composite core when a Set-Interface or Set-Config control request arrives from the host. It happens within the composite_setup() handler, which is called by the UDC driver when a control request arrives, which means it happens in the context of the UDC driver's interrupt handler. Therefore ->set_alt() callbacks must not sleep.
This should be changed. I don't think we can expect set_alt() to be in interrupt context only.
Agreed.
To do this right, I can't think of any approach other than to make the composite core use a work queue or other kernel thread for handling Set-Interface and Set-Config calls.
Sounds like it should've been like this initially.
I guess the nobody thought through the issues very carefully at the time the composite framework was designed. Maybe the UDCs that existed back did not require a lot of time to flush endpoints; I can't remember.
Without that ability, we will have to audit every function driver to make sure the ->set_alt() callbacks do ensure that endpoints are flushed before they are re-enabled.
There does not seem to be any way to fix the problem just by changing the gadget core.
We can have a workaround in dwc3 that can temporarily "work" with what we have. However, eventually, we will need to properly rework this and audit the gadget drivers.
Clearly, the first step is to change the composite core. That can be done without messing up anything else. But yes, eventually the gadget drivers will have to be audited.
Alan Stern
Hi Thinh,
Do you have any suggestions that might be helpful for us to try on our side? This EP resource‑conflict problem becomes easily observable when the RNDIS network test executing ifconfig rndis0 down/up is run repeatedly on the device side.
Thanks, Selva
On Wed, Dec 03, 2025, Selvarasu Ganesan wrote:
On 11/21/2025 8:38 AM, Alan Stern wrote:
On Fri, Nov 21, 2025 at 02:22:02AM +0000, Thinh Nguyen wrote:
On Wed, Nov 19, 2025, Alan Stern wrote:
->set_alt() is called by the composite core when a Set-Interface or Set-Config control request arrives from the host. It happens within the composite_setup() handler, which is called by the UDC driver when a control request arrives, which means it happens in the context of the UDC driver's interrupt handler. Therefore ->set_alt() callbacks must not sleep.
This should be changed. I don't think we can expect set_alt() to be in interrupt context only.
Agreed.
To do this right, I can't think of any approach other than to make the composite core use a work queue or other kernel thread for handling Set-Interface and Set-Config calls.
Sounds like it should've been like this initially.
I guess the nobody thought through the issues very carefully at the time the composite framework was designed. Maybe the UDCs that existed back did not require a lot of time to flush endpoints; I can't remember.
Without that ability, we will have to audit every function driver to make sure the ->set_alt() callbacks do ensure that endpoints are flushed before they are re-enabled.
There does not seem to be any way to fix the problem just by changing the gadget core.
We can have a workaround in dwc3 that can temporarily "work" with what we have. However, eventually, we will need to properly rework this and audit the gadget drivers.
Clearly, the first step is to change the composite core. That can be done without messing up anything else. But yes, eventually the gadget drivers will have to be audited.
Alan Stern
Hi Thinh,
Do you have any suggestions that might be helpful for us to try on our side? This EP resource‑conflict problem becomes easily observable when the RNDIS network test executing ifconfig rndis0 down/up is run repeatedly on the device side.
Thanks, Selva
At the moment, I can't think of a way to workaround for all cases. Let's just leave bulk streams alone for now. Until we have proper fixes to the gadget framework, let's just try the below.
Thanks, Thinh
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 3830aa2c10a9..974573304441 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -960,11 +960,18 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) }
/* - * Issue StartTransfer here with no-op TRB so we can always rely on No - * Response Update Transfer command. + * For streams, at start, there maybe a race where the + * host primes the endpoint before the function driver + * queues a request to initiate a stream. In that case, + * the controller will not see the prime to generate the + * ERDY and start stream. To workaround this, issue a + * no-op TRB as normal, but end it immediately. As a + * result, when the function driver queues the request, + * the next START_TRANSFER command will cause the + * controller to generate an ERDY to initiate the + * stream. */ - if (usb_endpoint_xfer_bulk(desc) || - usb_endpoint_xfer_int(desc)) { + if (dep->stream_capable) { struct dwc3_gadget_ep_cmd_params params; struct dwc3_trb *trb; dma_addr_t trb_dma; @@ -983,35 +990,21 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) if (ret < 0) return ret;
- if (dep->stream_capable) { - /* - * For streams, at start, there maybe a race where the - * host primes the endpoint before the function driver - * queues a request to initiate a stream. In that case, - * the controller will not see the prime to generate the - * ERDY and start stream. To workaround this, issue a - * no-op TRB as normal, but end it immediately. As a - * result, when the function driver queues the request, - * the next START_TRANSFER command will cause the - * controller to generate an ERDY to initiate the - * stream. - */ - dwc3_stop_active_transfer(dep, true, true); + dwc3_stop_active_transfer(dep, true, true);
- /* - * All stream eps will reinitiate stream on NoStream - * rejection. - * - * However, if the controller is capable of - * TXF_FLUSH_BYPASS, then IN direction endpoints will - * automatically restart the stream without the driver - * initiation. - */ - if (!dep->direction || - !(dwc->hwparams.hwparams9 & - DWC3_GHWPARAMS9_DEV_TXF_FLUSH_BYPASS)) - dep->flags |= DWC3_EP_FORCE_RESTART_STREAM; - } + /* + * All stream eps will reinitiate stream on NoStream + * rejection. + * + * However, if the controller is capable of + * TXF_FLUSH_BYPASS, then IN direction endpoints will + * automatically restart the stream without the driver + * initiation. + */ + if (!dep->direction || + !(dwc->hwparams.hwparams9 & + DWC3_GHWPARAMS9_DEV_TXF_FLUSH_BYPASS)) + dep->flags |= DWC3_EP_FORCE_RESTART_STREAM; }
out:
On 12/4/2025 7:21 AM, Thinh Nguyen wrote:
On Wed, Dec 03, 2025, Selvarasu Ganesan wrote:
On 11/21/2025 8:38 AM, Alan Stern wrote:
On Fri, Nov 21, 2025 at 02:22:02AM +0000, Thinh Nguyen wrote:
On Wed, Nov 19, 2025, Alan Stern wrote:
->set_alt() is called by the composite core when a Set-Interface or Set-Config control request arrives from the host. It happens within the composite_setup() handler, which is called by the UDC driver when a control request arrives, which means it happens in the context of the UDC driver's interrupt handler. Therefore ->set_alt() callbacks must not sleep.
This should be changed. I don't think we can expect set_alt() to be in interrupt context only.
Agreed.
To do this right, I can't think of any approach other than to make the composite core use a work queue or other kernel thread for handling Set-Interface and Set-Config calls.
Sounds like it should've been like this initially.
I guess the nobody thought through the issues very carefully at the time the composite framework was designed. Maybe the UDCs that existed back did not require a lot of time to flush endpoints; I can't remember.
Without that ability, we will have to audit every function driver to make sure the ->set_alt() callbacks do ensure that endpoints are flushed before they are re-enabled.
There does not seem to be any way to fix the problem just by changing the gadget core.
We can have a workaround in dwc3 that can temporarily "work" with what we have. However, eventually, we will need to properly rework this and audit the gadget drivers.
Clearly, the first step is to change the composite core. That can be done without messing up anything else. But yes, eventually the gadget drivers will have to be audited.
Alan Stern
Hi Thinh,
Do you have any suggestions that might be helpful for us to try on our side? This EP resource‑conflict problem becomes easily observable when the RNDIS network test executing ifconfig rndis0 down/up is run repeatedly on the device side.
Thanks, Selva
At the moment, I can't think of a way to workaround for all cases. Let's just leave bulk streams alone for now. Until we have proper fixes to the gadget framework, let's just try the below.
Thanks, Thinh
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 3830aa2c10a9..974573304441 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -960,11 +960,18 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) } /*
* Issue StartTransfer here with no-op TRB so we can always rely on No* Response Update Transfer command.
* For streams, at start, there maybe a race where the* host primes the endpoint before the function driver* queues a request to initiate a stream. In that case,* the controller will not see the prime to generate the* ERDY and start stream. To workaround this, issue a* no-op TRB as normal, but end it immediately. As a* result, when the function driver queues the request,* the next START_TRANSFER command will cause the* controller to generate an ERDY to initiate the */* stream.
- if (usb_endpoint_xfer_bulk(desc) ||
usb_endpoint_xfer_int(desc)) {
- if (dep->stream_capable) { struct dwc3_gadget_ep_cmd_params params; struct dwc3_trb *trb; dma_addr_t trb_dma;
@@ -983,35 +990,21 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) if (ret < 0) return ret;
if (dep->stream_capable) {/** For streams, at start, there maybe a race where the* host primes the endpoint before the function driver* queues a request to initiate a stream. In that case,* the controller will not see the prime to generate the* ERDY and start stream. To workaround this, issue a* no-op TRB as normal, but end it immediately. As a* result, when the function driver queues the request,* the next START_TRANSFER command will cause the* controller to generate an ERDY to initiate the* stream.*/dwc3_stop_active_transfer(dep, true, true);
dwc3_stop_active_transfer(dep, true, true);
/** All stream eps will reinitiate stream on NoStream* rejection.** However, if the controller is capable of* TXF_FLUSH_BYPASS, then IN direction endpoints will* automatically restart the stream without the driver* initiation.*/if (!dep->direction ||!(dwc->hwparams.hwparams9 &DWC3_GHWPARAMS9_DEV_TXF_FLUSH_BYPASS))dep->flags |= DWC3_EP_FORCE_RESTART_STREAM;}
/** All stream eps will reinitiate stream on NoStream* rejection.** However, if the controller is capable of* TXF_FLUSH_BYPASS, then IN direction endpoints will* automatically restart the stream without the driver* initiation.*/if (!dep->direction ||!(dwc->hwparams.hwparams9 &DWC3_GHWPARAMS9_DEV_TXF_FLUSH_BYPASS)) }dep->flags |= DWC3_EP_FORCE_RESTART_STREAM;out:
Hi Thinh,
Thanks for the changes. We understand the given fix and have verified that the original issue is resolved, but a similar below warning appears again in `dwc3_gadget_ep_queue` when we run a long duration our test. And we confirmed this is not due to this new given changes.
This warning is caused by a race between `dwc3_gadget_ep_disable` and `dwc3_gadget_ep_queue` that manipulates `dep->flags`.
Please refer the below sequence for the reference.
The warning originates from a race condition between dwc3_gadget_ep_disable and dwc3_send_gadget_ep_cmd from dwc3_gadget_ep_queue that both manipulate dep->flags. Proper synchronization or a check is needed when masking (dep->flags &= mask) inside dwc3_gadget_ep_disable.
Thread#1: usb_ep_queue ->dwc3_gadget_ep_queue ->__dwc3_gadget_kick_transfer -> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED); ->if(starting) ->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER) ->dep->flags |= DWC3_EP_TRANSFER_STARTED;
Thread#2: dwc3_gadget_ep_disable ->__dwc3_gadget_ep_disable ->dwc3_remove_requests ->dwc3_stop_active_transfer ->__dwc3_stop_active_transfer -> dwc3_send_gadget_ep_cmd (cmd =DWC3_DEPCMD_ENDTRANSFER) ->if(!interrupt)dep->flags &= ~DWC3_EP_TRANSFER_STARTED; ... While Thread#2 is still running, Thread #1 starts again:
Thread#1: usb_ep_queue ->dwc3_gadget_ep_queue ->__dwc3_gadget_kick_transfer -> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED); ->if(starting) ->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER) ->dep->flags |= DWC3_EP_TRANSFER_STARTED; ... ###### Thread#2: Continuation (race) ->__dwc3_gadget_ep_disable ->mask = DWC3_EP_TXFIFO_RESIZED | DWC3_EP_RESOURCE_ALLOCATED; ->dep->flags &= mask; --> // Possible of clears DWC3_EP_TRANSFER_STARTED flag as well without sending DWC3_DEPCMD_ENDTRANSFER
Thread#1:(Failure due to race) usb_ep_queue ->dwc3_gadget_ep_queue ->__dwc3_gadget_kick_transfer -> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED); ->if(starting) ->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER) -> Results in “No resource” allocation error because the previous transfer was never end with ENDTRANSFER.
------------[ cut here ]------------ dwc3 13200000.dwc3: No resource for ep1in WARNING: CPU: 7 PID: 1748 at drivers/usb/dwc3/gadget.c:398 dwc3_send_gadget_ep_cmd+0x2f8/0x76c pc : dwc3_send_gadget_ep_cmd+0x2f8/0x76c lr : dwc3_send_gadget_ep_cmd+0x2f8/0x76c Call trace: dwc3_send_gadget_ep_cmd+0x2f8/0x76c __dwc3_gadget_kick_transfer+0x2ec/0x3f4 dwc3_gadget_ep_queue+0x140/0x1f0 usb_ep_queue+0x60/0xec mp_tx_task+0x100/0x134 mp_tx_timeout+0xd0/0x1e0 __hrtimer_run_queues+0x130/0x318 hrtimer_interrupt+0xe8/0x340 exynos_mct_comp_isr+0x58/0x80 __handle_irq_event_percpu+0xcc/0x25c handle_irq_event+0x40/0x9c handle_fasteoi_irq+0x154/0x2c8 generic_handle_domain_irq+0x58/0x80 gic_handle_irq+0x48/0x104 call_on_irq_stack+0x3c/0x50 do_interrupt_handler+0x4c/0x84 el1_interrupt+0x34/0x58 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x68/0x6c _raw_spin_unlock_irqrestore+0xc/0x4c binder_wakeup_thread_ilocked+0x50/0xb4 binder_proc_transaction+0x308/0x6ec binder_transaction+0x1aec/0x23b0 binder_ioctl_write_read+0xa28/0x2534 binder_ioctl+0x1fc/0xb3c __arm64_sys_ioctl+0xa8/0xe4 invoke_syscall+0x58/0x10c el0_svc_common+0x80/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x38/0x88 el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac ---[ end trace 0000000000000000 ]---
Thanks, Selva
On Thu, Dec 04, 2025, Selvarasu Ganesan wrote:
On 12/4/2025 7:21 AM, Thinh Nguyen wrote:
At the moment, I can't think of a way to workaround for all cases. Let's just leave bulk streams alone for now. Until we have proper fixes to the gadget framework, let's just try the below.
Hi Thinh,
Thanks for the changes. We understand the given fix and have verified that the original issue is resolved, but a similar below warning appears again in `dwc3_gadget_ep_queue` when we run a long duration our test. And we confirmed this is not due to this new given changes.
This warning is caused by a race between `dwc3_gadget_ep_disable` and `dwc3_gadget_ep_queue` that manipulates `dep->flags`.
Please refer the below sequence for the reference.
The warning originates from a race condition between dwc3_gadget_ep_disable and dwc3_send_gadget_ep_cmd from dwc3_gadget_ep_queue that both manipulate dep->flags. Proper synchronization or a check is needed when masking (dep->flags &= mask) inside dwc3_gadget_ep_disable.
I was hoping that the dwc3_gadget_ep_queue() won't come early to run into this scenario. What I've provided will only mitigate and will not resolve for all cases. It seems adding more checks in dwc3 will be more messy.
Probably we should try rework the usb gadget framework instead of workaround the problem in dwc3. Here is a potential solution I'm thinking: introduce usb_ep_disable_with_flush().
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c index 0474c92ac41a..f5bb064c2e9c 100644 --- a/drivers/usb/gadget/udc/core.c +++ b/drivers/usb/gadget/udc/core.c @@ -144,10 +144,9 @@ EXPORT_SYMBOL_GPL(usb_ep_enable); * usb_ep_disable - endpoint is no longer usable * @ep:the endpoint being unconfigured. may not be the endpoint named "ep0". * - * no other task may be using this endpoint when this is called. - * any pending and uncompleted requests will complete with status - * indicating disconnect (-ESHUTDOWN) before this call returns. - * gadget drivers must call usb_ep_enable() again before queueing + * No other task may be using this endpoint when this is called. + * Pending and incompleted requests must be flushed before executing + * this. Gadget drivers must call usb_ep_enable() again before queueing * requests to the endpoint. * * This routine may be called in an atomic (interrupt) context. @@ -174,6 +173,40 @@ int usb_ep_disable(struct usb_ep *ep) } EXPORT_SYMBOL_GPL(usb_ep_disable);
+/** + * usb_ep_disable_with_flush - disable and flush endpoint + * @ep: the non-control endpoint to be disabled and flushed. + * + * No other task may be using this endpoint when this is called. + * Any pending and incompleted requests will be completed with status + * indicating disconnect (-ESHUTDOWN) before this call returns. + * gadget drivers must call usb_ep_enable() again before queueing + * requests to the endpoint. + * + * This routine must be called in process context. + * + * returns zero, or a negative error code. + */ +int usb_ep_disable_with_flush(struct usb_ep *ep) +{ + int ret = 0; + + if (!ep->enabled) + goto out; + + if (!ep->ops->disable_with_flush) + return -EOPNOTSUPP; + + ep->enabled = false; + + ret = ep->ops->disable_with_flush(ep); +out: + trace_usb_ep_disable_with_flush(ep, ret); + + return ret; +} +EXPORT_SYMBOL_GPL(usb_ep_disable_with_flush); + /** * usb_ep_alloc_request - allocate a request object to use with this endpoint * @ep:the endpoint to be used with with the request
---
Then we'll gradually start auditing and replacing usb_ep_disable() with usb_ep_disable_with_flush() where needed. We'll also need to rework the composite framework for set_alt() to be run in process context.
BR, Thinh
On Fri, Dec 05, 2025, Thinh Nguyen wrote:
On Thu, Dec 04, 2025, Selvarasu Ganesan wrote:
On 12/4/2025 7:21 AM, Thinh Nguyen wrote:
At the moment, I can't think of a way to workaround for all cases. Let's just leave bulk streams alone for now. Until we have proper fixes to the gadget framework, let's just try the below.
Hi Thinh,
Thanks for the changes. We understand the given fix and have verified that the original issue is resolved, but a similar below warning appears again in `dwc3_gadget_ep_queue` when we run a long duration our test. And we confirmed this is not due to this new given changes.
This warning is caused by a race between `dwc3_gadget_ep_disable` and `dwc3_gadget_ep_queue` that manipulates `dep->flags`.
Please refer the below sequence for the reference.
The warning originates from a race condition between dwc3_gadget_ep_disable and dwc3_send_gadget_ep_cmd from dwc3_gadget_ep_queue that both manipulate dep->flags. Proper synchronization or a check is needed when masking (dep->flags &= mask) inside dwc3_gadget_ep_disable.
I was hoping that the dwc3_gadget_ep_queue() won't come early to run into this scenario. What I've provided will only mitigate and will not resolve for all cases. It seems adding more checks in dwc3 will be more messy.
Probably we should try rework the usb gadget framework instead of workaround the problem in dwc3. Here is a potential solution I'm thinking: introduce usb_ep_disable_with_flush().
Actually, no. Let's just revert this:
b0d5d2a71641 ("usb: gadget: udc: core: Revise comments for USB ep enable/disable")
Reword the implementation in dwc3 and audit where usb_ep_disable() is used.
Thanks, Thinh
linux-stable-mirror@lists.linaro.org