On 12/4/2025 7:21 AM, Thinh Nguyen wrote:
On Wed, Dec 03, 2025, Selvarasu Ganesan wrote:
On 11/21/2025 8:38 AM, Alan Stern wrote:
On Fri, Nov 21, 2025 at 02:22:02AM +0000, Thinh Nguyen wrote:
On Wed, Nov 19, 2025, Alan Stern wrote:
->set_alt() is called by the composite core when a Set-Interface or Set-Config control request arrives from the host. It happens within the composite_setup() handler, which is called by the UDC driver when a control request arrives, which means it happens in the context of the UDC driver's interrupt handler. Therefore ->set_alt() callbacks must not sleep.
This should be changed. I don't think we can expect set_alt() to be in interrupt context only.
Agreed.
To do this right, I can't think of any approach other than to make the composite core use a work queue or other kernel thread for handling Set-Interface and Set-Config calls.
Sounds like it should've been like this initially.
I guess the nobody thought through the issues very carefully at the time the composite framework was designed. Maybe the UDCs that existed back did not require a lot of time to flush endpoints; I can't remember.
Without that ability, we will have to audit every function driver to make sure the ->set_alt() callbacks do ensure that endpoints are flushed before they are re-enabled.
There does not seem to be any way to fix the problem just by changing the gadget core.
We can have a workaround in dwc3 that can temporarily "work" with what we have. However, eventually, we will need to properly rework this and audit the gadget drivers.
Clearly, the first step is to change the composite core. That can be done without messing up anything else. But yes, eventually the gadget drivers will have to be audited.
Alan Stern
Hi Thinh,
Do you have any suggestions that might be helpful for us to try on our side? This EP resource‑conflict problem becomes easily observable when the RNDIS network test executing ifconfig rndis0 down/up is run repeatedly on the device side.
Thanks, Selva
At the moment, I can't think of a way to workaround for all cases. Let's just leave bulk streams alone for now. Until we have proper fixes to the gadget framework, let's just try the below.
Thanks, Thinh
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 3830aa2c10a9..974573304441 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -960,11 +960,18 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) } /*
* Issue StartTransfer here with no-op TRB so we can always rely on No* Response Update Transfer command.
* For streams, at start, there maybe a race where the* host primes the endpoint before the function driver* queues a request to initiate a stream. In that case,* the controller will not see the prime to generate the* ERDY and start stream. To workaround this, issue a* no-op TRB as normal, but end it immediately. As a* result, when the function driver queues the request,* the next START_TRANSFER command will cause the* controller to generate an ERDY to initiate the */* stream.
- if (usb_endpoint_xfer_bulk(desc) ||
usb_endpoint_xfer_int(desc)) {
- if (dep->stream_capable) { struct dwc3_gadget_ep_cmd_params params; struct dwc3_trb *trb; dma_addr_t trb_dma;
@@ -983,35 +990,21 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action) if (ret < 0) return ret;
if (dep->stream_capable) {/** For streams, at start, there maybe a race where the* host primes the endpoint before the function driver* queues a request to initiate a stream. In that case,* the controller will not see the prime to generate the* ERDY and start stream. To workaround this, issue a* no-op TRB as normal, but end it immediately. As a* result, when the function driver queues the request,* the next START_TRANSFER command will cause the* controller to generate an ERDY to initiate the* stream.*/dwc3_stop_active_transfer(dep, true, true);
dwc3_stop_active_transfer(dep, true, true);
/** All stream eps will reinitiate stream on NoStream* rejection.** However, if the controller is capable of* TXF_FLUSH_BYPASS, then IN direction endpoints will* automatically restart the stream without the driver* initiation.*/if (!dep->direction ||!(dwc->hwparams.hwparams9 &DWC3_GHWPARAMS9_DEV_TXF_FLUSH_BYPASS))dep->flags |= DWC3_EP_FORCE_RESTART_STREAM;}
/** All stream eps will reinitiate stream on NoStream* rejection.** However, if the controller is capable of* TXF_FLUSH_BYPASS, then IN direction endpoints will* automatically restart the stream without the driver* initiation.*/if (!dep->direction ||!(dwc->hwparams.hwparams9 &DWC3_GHWPARAMS9_DEV_TXF_FLUSH_BYPASS)) }dep->flags |= DWC3_EP_FORCE_RESTART_STREAM;out:
Hi Thinh,
Thanks for the changes. We understand the given fix and have verified that the original issue is resolved, but a similar below warning appears again in `dwc3_gadget_ep_queue` when we run a long duration our test. And we confirmed this is not due to this new given changes.
This warning is caused by a race between `dwc3_gadget_ep_disable` and `dwc3_gadget_ep_queue` that manipulates `dep->flags`.
Please refer the below sequence for the reference.
The warning originates from a race condition between dwc3_gadget_ep_disable and dwc3_send_gadget_ep_cmd from dwc3_gadget_ep_queue that both manipulate dep->flags. Proper synchronization or a check is needed when masking (dep->flags &= mask) inside dwc3_gadget_ep_disable.
Thread#1: usb_ep_queue ->dwc3_gadget_ep_queue ->__dwc3_gadget_kick_transfer -> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED); ->if(starting) ->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER) ->dep->flags |= DWC3_EP_TRANSFER_STARTED;
Thread#2: dwc3_gadget_ep_disable ->__dwc3_gadget_ep_disable ->dwc3_remove_requests ->dwc3_stop_active_transfer ->__dwc3_stop_active_transfer -> dwc3_send_gadget_ep_cmd (cmd =DWC3_DEPCMD_ENDTRANSFER) ->if(!interrupt)dep->flags &= ~DWC3_EP_TRANSFER_STARTED; ... While Thread#2 is still running, Thread #1 starts again:
Thread#1: usb_ep_queue ->dwc3_gadget_ep_queue ->__dwc3_gadget_kick_transfer -> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED); ->if(starting) ->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER) ->dep->flags |= DWC3_EP_TRANSFER_STARTED; ... ###### Thread#2: Continuation (race) ->__dwc3_gadget_ep_disable ->mask = DWC3_EP_TXFIFO_RESIZED | DWC3_EP_RESOURCE_ALLOCATED; ->dep->flags &= mask; --> // Possible of clears DWC3_EP_TRANSFER_STARTED flag as well without sending DWC3_DEPCMD_ENDTRANSFER
Thread#1:(Failure due to race) usb_ep_queue ->dwc3_gadget_ep_queue ->__dwc3_gadget_kick_transfer -> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED); ->if(starting) ->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER) -> Results in “No resource” allocation error because the previous transfer was never end with ENDTRANSFER.
------------[ cut here ]------------ dwc3 13200000.dwc3: No resource for ep1in WARNING: CPU: 7 PID: 1748 at drivers/usb/dwc3/gadget.c:398 dwc3_send_gadget_ep_cmd+0x2f8/0x76c pc : dwc3_send_gadget_ep_cmd+0x2f8/0x76c lr : dwc3_send_gadget_ep_cmd+0x2f8/0x76c Call trace: dwc3_send_gadget_ep_cmd+0x2f8/0x76c __dwc3_gadget_kick_transfer+0x2ec/0x3f4 dwc3_gadget_ep_queue+0x140/0x1f0 usb_ep_queue+0x60/0xec mp_tx_task+0x100/0x134 mp_tx_timeout+0xd0/0x1e0 __hrtimer_run_queues+0x130/0x318 hrtimer_interrupt+0xe8/0x340 exynos_mct_comp_isr+0x58/0x80 __handle_irq_event_percpu+0xcc/0x25c handle_irq_event+0x40/0x9c handle_fasteoi_irq+0x154/0x2c8 generic_handle_domain_irq+0x58/0x80 gic_handle_irq+0x48/0x104 call_on_irq_stack+0x3c/0x50 do_interrupt_handler+0x4c/0x84 el1_interrupt+0x34/0x58 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x68/0x6c _raw_spin_unlock_irqrestore+0xc/0x4c binder_wakeup_thread_ilocked+0x50/0xb4 binder_proc_transaction+0x308/0x6ec binder_transaction+0x1aec/0x23b0 binder_ioctl_write_read+0xa28/0x2534 binder_ioctl+0x1fc/0xb3c __arm64_sys_ioctl+0xa8/0xe4 invoke_syscall+0x58/0x10c el0_svc_common+0x80/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x38/0x88 el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac ---[ end trace 0000000000000000 ]---
Thanks, Selva