On Tue, Oct 18, 2022, Jeffrey Vanhoof wrote:
Hi Thinh,
On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
Hi Dan,
On Mon, Oct 17, 2022, Dan Vacura wrote:
Hi Thinh,
On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
On Mon, Oct 17, 2022, Dan Vacura wrote:
From: Jeff Vanhoof qjv001@motorola.com
arm-smmu related crashes seen after a Missed ISOC interrupt when no_interrupt=1 is used. This can happen if the hardware is still using the data associated with a TRB after the usb_request's ->complete call has been made. Instead of immediately releasing a request when a Missed ISOC interrupt has occurred, this change will add logic to cancel the request instead where it will eventually be released when the END_TRANSFER command has completed. This logic is similar to some of the cleanup done in dwc3_gadget_ep_dequeue.
This doesn't sound right. How did you determine that the hardware is still using the data associated with the TRB? Did you check the TRB's HWO bit?
The problem we're seeing was mentioned in the summary of this patch series, issue #1. Basically, with the following patch https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/p... integrated a smmu panic is occurring on our Android device with the 5.15 kernel which is:
<3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
The uvc gadget driver appears to be the first (and only) gadget that uses the no_interrupt=1 logic, so this seems to be a new condition for the dwc3 driver. In our configuration, we have up to 64 requests and the no_interrupt=1 for up to 15 requests. The list size of dep->started_list would get up to that amount when looping through to cleanup the completed requests. From testing and debugging the smmu panic occurs when a -EXDEV status shows up and right after dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion we had was the requests were getting returned to the gadget too early.
As I mentioned, if the status is updated to missed isoc, that means that the controller returned ownership of the TRB to the driver. At least for the particular request with -EXDEV, its TRBs are completed. I'm not clear on your conclusion.
Do we know where did the crash occur? Is it from dwc3 driver or from uvc driver, and at what line? It'd great if we can see the driver log.
To interject, what should happen in dwc3_gadget_ep_reclaim_completed_trb if the IOC bit is not set (but the IMI bit is) and -EXDEV status is passed into it?
Hm... we may have overlooked this case for no_interrupt scenario. If IMI is set, then there will be an interrupt when there's missed isoc regardless of whether no_interrupt is set by the gadget driver.
If the function returns 0, another attempt to reclaim may occur. If this happens and the next request did have the HWO bit set, the function would return 1 but dwc3_gadget_ep_cleanup_completed_request would still call dwc3_gadget_giveback.
As a test (without this patch), I added a check to see if HWO bit was set in dwc3_gadget_ep_cleanup_completed_requests(). If the usecase was ISOC and the HWO bit was set I avoided calling dwc3_gadget_ep_cleanup_completed_request(). This seemed to also avoid the iommu related crash being seen.
Is there an issue in this area that needs to be corrected instead? Not having interrupts set for each request may be causing some new issues to be uncovered.
As far as the crash seen without this patch, no good stacktrace is given. Line provided for crash varied a bit, but tended to appear towards the end of dwc3_stop_active_transfer() or dwc3_gadget_endpoint_trbs_complete().
Since dwc3_gadget_endpoint_trbs_complete() can be called from multiple locations, I duplicated the function to help identify which path it was likely being called from. At the time of the crashes seen, dwc3_gadget_endpoint_transfer_in_progress() appeared to be the caller.
dwc3_gadget_endpoint_transfer_in_progress() ->dwc3_gadget_endpoint_trbs_complete() (crashed towards end of here) ->dwc3_stop_active_transfer() (sometimes crashed towards end of here)
I hope this clarifies things a bit.
Can we try this? Let me know if it resolves your issue.
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 61fba2b7389b..8352f4b5dd9f 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -3657,6 +3657,10 @@ static int dwc3_gadget_ep_reclaim_completed_trb(struct dwc3_ep *dep, if (event->status & DEPEVT_STATUS_SHORT && !chain) return 1;
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) && + (event->status & DEPEVT_STATUS_MISSED_ISOC) && !chain) + return 1; + if ((trb->ctrl & DWC3_TRB_CTRL_IOC) || (trb->ctrl & DWC3_TRB_CTRL_LST)) return 1;
Thanks, Thinh