Commit 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper") introduced an optimization to xhci_reset() during xhci removal, allowing it to bail out early without waiting for the reset to complete.
This behavior can cause issues on SNPS DWC3 USB controller with dual-role capability. When the DWC3 controller exits host mode and removes xhci while a reset is still in progress, and then tries to configure its hardware for device mode, the ongoing reset leads to register access issues; specifically, all register reads returns 0. These issues extend beyond the xhci register space (which is expected during a reset) and affect the entire DWC3 IP block, causing the DWC3 device mode to malfunction.
To address this, introduce the `XHCI_FULL_RESET_ON_REMOVE` quirk. When this quirk is set, xhci_reset() always completes its reset handshake, ensuring the controller is in a fully reset state before proceeding.
Cc: stable@vger.kernel.org Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper") Signed-off-by: Roy Luo royluo@google.com --- drivers/usb/host/xhci-plat.c | 3 +++ drivers/usb/host/xhci.c | 8 +++++++- drivers/usb/host/xhci.h | 1 + 3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 3155e3a842da..19c5c26a8e63 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -265,6 +265,9 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s if (device_property_read_bool(tmpdev, "xhci-skip-phy-init-quirk")) xhci->quirks |= XHCI_SKIP_PHY_INIT;
+ if (device_property_read_bool(tmpdev, "xhci-full-reset-on-remove-quirk")) + xhci->quirks |= XHCI_FULL_RESET_ON_REMOVE; + device_property_read_u32(tmpdev, "imod-interval-ns", &xhci->imod_interval); } diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 90eb491267b5..4f091d618c01 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -198,6 +198,7 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) u32 command; u32 state; int ret; + unsigned int exit_state;
state = readl(&xhci->op_regs->status);
@@ -226,8 +227,13 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) if (xhci->quirks & XHCI_INTEL_HOST) udelay(1000);
+ if (xhci->quirks & XHCI_FULL_RESET_ON_REMOVE) + exit_state = 0; + else + exit_state = XHCI_STATE_REMOVING; + ret = xhci_handshake_check_state(xhci, &xhci->op_regs->command, - CMD_RESET, 0, timeout_us, XHCI_STATE_REMOVING); + CMD_RESET, 0, timeout_us, exit_state); if (ret) return ret;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 242ab9fbc8ae..ac65af788298 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1637,6 +1637,7 @@ struct xhci_hcd { #define XHCI_WRITE_64_HI_LO BIT_ULL(47) #define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48) #define XHCI_ETRON_HOST BIT_ULL(49) +#define XHCI_FULL_RESET_ON_REMOVE BIT_ULL(50)
unsigned int num_active_eps; unsigned int limit_active_eps;
On Thu, May 15, 2025, Roy Luo wrote:
Commit 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper") introduced an optimization to xhci_reset() during xhci removal, allowing it to bail out early without waiting for the reset to complete.
This behavior can cause issues on SNPS DWC3 USB controller with dual-role capability. When the DWC3 controller exits host mode and removes xhci while a reset is still in progress, and then tries to configure its hardware for device mode, the ongoing reset leads to register access issues; specifically, all register reads returns 0. These issues extend beyond the xhci register space (which is expected during a reset) and affect the entire DWC3 IP block, causing the DWC3 device mode to malfunction.
To address this, introduce the `XHCI_FULL_RESET_ON_REMOVE` quirk. When this quirk is set, xhci_reset() always completes its reset handshake, ensuring the controller is in a fully reset state before proceeding.
Cc: stable@vger.kernel.org Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper") Signed-off-by: Roy Luo royluo@google.com
drivers/usb/host/xhci-plat.c | 3 +++ drivers/usb/host/xhci.c | 8 +++++++- drivers/usb/host/xhci.h | 1 + 3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 3155e3a842da..19c5c26a8e63 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -265,6 +265,9 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s if (device_property_read_bool(tmpdev, "xhci-skip-phy-init-quirk")) xhci->quirks |= XHCI_SKIP_PHY_INIT;
if (device_property_read_bool(tmpdev, "xhci-full-reset-on-remove-quirk"))
xhci->quirks |= XHCI_FULL_RESET_ON_REMOVE;
- device_property_read_u32(tmpdev, "imod-interval-ns", &xhci->imod_interval); }
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 90eb491267b5..4f091d618c01 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -198,6 +198,7 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) u32 command; u32 state; int ret;
- unsigned int exit_state;
state = readl(&xhci->op_regs->status); @@ -226,8 +227,13 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) if (xhci->quirks & XHCI_INTEL_HOST) udelay(1000);
- if (xhci->quirks & XHCI_FULL_RESET_ON_REMOVE)
exit_state = 0;
There's no state 0. Checking against that is odd. Couldn't we just use xhci_handshake() equivalent instead?
In any case, this is basically a revert of this change: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Can't we just revert or fix the above patch that causes a regression?
Thanks, Thinh
- else
exit_state = XHCI_STATE_REMOVING;
- ret = xhci_handshake_check_state(xhci, &xhci->op_regs->command,
CMD_RESET, 0, timeout_us, XHCI_STATE_REMOVING);
if (ret) return ret;CMD_RESET, 0, timeout_us, exit_state);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 242ab9fbc8ae..ac65af788298 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1637,6 +1637,7 @@ struct xhci_hcd { #define XHCI_WRITE_64_HI_LO BIT_ULL(47) #define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48) #define XHCI_ETRON_HOST BIT_ULL(49) +#define XHCI_FULL_RESET_ON_REMOVE BIT_ULL(50) unsigned int num_active_eps; unsigned int limit_active_eps; -- 2.49.0.1112.g889b7c5bd8-goog
On Thu, 15 May 2025 23:42:50 +0000, Thinh Nguyen wrote:
In any case, this is basically a revert of this change: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Can't we just revert or fix the above patch that causes a regression?
Also note that 6ccb83d6c497 claimed to fix actual problems, so disabling it on selected hardware could bring the old bug back:
In some situations where xhci removal happens parallel to xhci_handshake, we encounter a scenario where the xhci_handshake can't succeed, and it polls until timeout.
If xhci_handshake runs until timeout it can on some platforms result in a long wait which might lead to a watchdog timeout.
But on the other hand, xhci_handshake() has long timeouts because the handshakes themselves can take a surprisingly long time (and sometimes still succeed), so any reliance on handshake completing before timeout is frankly a bug in itself.
Regards, Michal
There's no state 0. Checking against that is odd. Couldn't we just use xhci_handshake() equivalent instead?
Ok, I will change it in the next version.
On Thu, May 15, 2025 at 11:33 PM Michał Pecio michal.pecio@gmail.com wrote:
On Thu, 15 May 2025 23:42:50 +0000, Thinh Nguyen wrote:
In any case, this is basically a revert of this change: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Can't we just revert or fix the above patch that causes a regression?
Also note that 6ccb83d6c497 claimed to fix actual problems, so disabling it on selected hardware could bring the old bug back:
In some situations where xhci removal happens parallel to xhci_handshake, we encounter a scenario where the xhci_handshake can't succeed, and it polls until timeout.
If xhci_handshake runs until timeout it can on some platforms result in a long wait which might lead to a watchdog timeout.
On top of this, xhci_handshake_check_state(XHCI_STATE_REMOVING) is also used elsewhere like xhci_abort_cmd_ring(), so a simple revert is off the table. Commit 6ccb83d6c497 did not specify which platform and in what circumstance would xhci handshake timeout, adding a quirk for DWC3 seems to be the better option here.
But on the other hand, xhci_handshake() has long timeouts because the handshakes themselves can take a surprisingly long time (and sometimes still succeed), so any reliance on handshake completing before timeout is frankly a bug in itself.
This patch simply honors the contract between the software and hardware, allowing the handshake to complete. It doesn't assume the handshake will finish on time. If it times out, then it times out and returns a failure.
Thanks, Roy
Hi Roy, Michał,
On Fri, May 16, 2025, Roy Luo wrote:
There's no state 0. Checking against that is odd. Couldn't we just use xhci_handshake() equivalent instead?
Ok, I will change it in the next version.
On Thu, May 15, 2025 at 11:33 PM Michał Pecio michal.pecio@gmail.com wrote:
On Thu, 15 May 2025 23:42:50 +0000, Thinh Nguyen wrote:
In any case, this is basically a revert of this change: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Can't we just revert or fix the above patch that causes a regression?
Also note that 6ccb83d6c497 claimed to fix actual problems, so disabling it on selected hardware could bring the old bug back:
In some situations where xhci removal happens parallel to xhci_handshake, we encounter a scenario where the xhci_handshake can't succeed, and it polls until timeout.
If xhci_handshake runs until timeout it can on some platforms result in a long wait which might lead to a watchdog timeout.
On top of this, xhci_handshake_check_state(XHCI_STATE_REMOVING) is also used elsewhere like xhci_abort_cmd_ring(), so a simple revert is off the table. Commit 6ccb83d6c497 did not specify which platform and in what circumstance would xhci handshake timeout, adding a quirk for DWC3 seems to be the better option here.
Regarding the commit 6ccb83d6c497, I'm assuming Udipto made the change for Qcom platforms. Hi @Udipto, if you're reading this, please confirm.
Many of the Qcom platforms are using dwc3 controller. The change you made here are affecting all the dwc3 DRD controllers, which has a good chance to also impact the Qcom platforms.
But on the other hand, xhci_handshake() has long timeouts because the handshakes themselves can take a surprisingly long time (and sometimes still succeed), so any reliance on handshake completing before timeout is frankly a bug in itself.
This patch simply honors the contract between the software and hardware, allowing the handshake to complete. It doesn't assume the handshake will finish on time. If it times out, then it times out and returns a failure.
As Michał pointed out, disregarding the xhci handshake timeout is not proper. The change 6ccb83d6c497 seems to workaround some different watchdog warning timeout instead of resolving the actual issue. The watchdog timeout should not be less than the handshake timeout here.
BR, Thinh
On Fri, May 16, 2025 at 4:38 PM Thinh Nguyen Thinh.Nguyen@synopsys.com wrote:
Hi Roy, Michał,
On Fri, May 16, 2025, Roy Luo wrote:
There's no state 0. Checking against that is odd. Couldn't we just use xhci_handshake() equivalent instead?
Ok, I will change it in the next version.
On Thu, May 15, 2025 at 11:33 PM Michał Pecio michal.pecio@gmail.com wrote:
On Thu, 15 May 2025 23:42:50 +0000, Thinh Nguyen wrote:
In any case, this is basically a revert of this change: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Can't we just revert or fix the above patch that causes a regression?
Also note that 6ccb83d6c497 claimed to fix actual problems, so disabling it on selected hardware could bring the old bug back:
In some situations where xhci removal happens parallel to xhci_handshake, we encounter a scenario where the xhci_handshake can't succeed, and it polls until timeout.
If xhci_handshake runs until timeout it can on some platforms result in a long wait which might lead to a watchdog timeout.
On top of this, xhci_handshake_check_state(XHCI_STATE_REMOVING) is also used elsewhere like xhci_abort_cmd_ring(), so a simple revert is off the table. Commit 6ccb83d6c497 did not specify which platform and in what circumstance would xhci handshake timeout, adding a quirk for DWC3 seems to be the better option here.
Regarding the commit 6ccb83d6c497, I'm assuming Udipto made the change for Qcom platforms. Hi @Udipto, if you're reading this, please confirm.
Many of the Qcom platforms are using dwc3 controller. The change you made here are affecting all the dwc3 DRD controllers, which has a good chance to also impact the Qcom platforms.
But on the other hand, xhci_handshake() has long timeouts because the handshakes themselves can take a surprisingly long time (and sometimes still succeed), so any reliance on handshake completing before timeout is frankly a bug in itself.
This patch simply honors the contract between the software and hardware, allowing the handshake to complete. It doesn't assume the handshake will finish on time. If it times out, then it times out and returns a failure.
As Michał pointed out, disregarding the xhci handshake timeout is not proper. The change 6ccb83d6c497 seems to workaround some different watchdog warning timeout instead of resolving the actual issue. The watchdog timeout should not be less than the handshake timeout here.
This makes sense, I will send out a revert of 6ccb83d6c497 then.
Thanks, Roy
linux-stable-mirror@lists.linaro.org