This reverts commit b138e23d3dff90c0494925b4c1874227b81bddf7.
AutoRetry has been found to sometimes cause controller freezes when
communicating with buggy USB devices.
This controller feature allows the controller in host mode to send
non-terminating/burst retry ACKs instead of terminating retry ACKs
to devices when a transaction error (CRC error or overflow) occurs.
Unfortunately, if the USB device continues to respond with a CRC error,
the controller will not complete endpoint-related commands while it
keeps trying to auto-retry. [3] The xHCI driver will notice this once
it tries to abort the transfer using a Stop Endpoint command and
does not receive a completion in time. [1]
This situation is reported to dmesg:
[sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00
xhci-hcd: xHCI host not responding to stop endpoint command
xhci-hcd: xHCI host controller not responding, assume dead
xhci-hcd: HC died; cleaning up
Some users observed this problem on an Odroid HC2 with the JMS578
USB3-to-SATA bridge. The issue can be triggered by starting
a read-heavy workload on an attached SSD. After a while, the host
controller would die and the SSD would disappear from the system. [1]
Further analysis by Synopsys determined that controller revisions
other than the one in Odroid HC2 are also affected by this.
The recommended solution was to disable AutoRetry altogether.
This change does not have a noticeable performance impact. [2]
Revert the enablement commit. This will keep the AutoRetry bit in
the default state configured during SoC design [2].
Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller")
Link: https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gm… [1]
Link: https://lore.kernel.org/r/20230711214834.kyr6ulync32d4ktk@synopsys.com/ [2]
Link: https://lore.kernel.org/r/20230712225518.2smu7wse6djc7l5o@synopsys.com/ [3]
Cc: stable(a)vger.kernel.org
Cc: Mauro Ribeiro <mauro.ribeiro(a)hardkernel.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Jakub Vanek <linuxtardis(a)gmail.com>
---
V2 -> V3: Include more findings in changelog
V1 -> V2: Updated to disable AutoRetry everywhere based on Synopsys feedback
Reworded the changelog a bit to make it clearer
drivers/usb/dwc3/core.c | 16 ----------------
drivers/usb/dwc3/core.h | 3 ---
2 files changed, 19 deletions(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index f6689b731718..a4e079d37566 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1209,22 +1209,6 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
}
- if (dwc->dr_mode == USB_DR_MODE_HOST ||
- dwc->dr_mode == USB_DR_MODE_OTG) {
- reg = dwc3_readl(dwc->regs, DWC3_GUCTL);
-
- /*
- * Enable Auto retry Feature to make the controller operating in
- * Host mode on seeing transaction errors(CRC errors or internal
- * overrun scenerios) on IN transfers to reply to the device
- * with a non-terminating retry ACK (i.e, an ACK transcation
- * packet with Retry=1 & Nump != 0)
- */
- reg |= DWC3_GUCTL_HSTINAUTORETRY;
-
- dwc3_writel(dwc->regs, DWC3_GUCTL, reg);
- }
-
/*
* Must config both number of packets and max burst settings to enable
* RX and/or TX threshold.
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8b1295e4dcdd..a69ac67d89fe 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -256,9 +256,6 @@
#define DWC3_GCTL_GBLHIBERNATIONEN BIT(1)
#define DWC3_GCTL_DSBLCLKGTNG BIT(0)
-/* Global User Control Register */
-#define DWC3_GUCTL_HSTINAUTORETRY BIT(14)
-
/* Global User Control 1 Register */
#define DWC3_GUCTL1_DEV_DECOUPLE_L1L2_EVT BIT(31)
#define DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS BIT(28)
--
2.25.1
From: Fedor Ross <fedor.ross(a)ifm.com>
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
mode' or . The maximum length of a CAN frame is 736 bits (64 data
bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
Otherwise during polling for the CAN controller to enter 'Normal CAN
2.0 mode' the timeout limit is exceeded and the configuration fails
with:
| $ ip link set dev can1 up type can bitrate 10000
| [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
| [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
| [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
| RTNETLINK answers: Connection timed out
Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
EFF mode, worst case bit stuffing and interframe spacing at the
current bit rate.
For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
that holds the max frame length in bits, which is 736. This can be
replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
a cleanup patch later.
Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Fedor Ross <fedor.ross(a)ifm.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
Cc: Manivannan Sadhasivam <mani(a)kernel.org>
Cc: Thomas Kopp <thomas.kopp(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20230717100815.75764-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
Hello,
picking up Fedor's and Marek's work. I decided to make it a minimal
patch and add stable on Cc. The mentioned cleanup patch that replaces
736 by can_frame_bits() can be done later and will go upstream via
can-next.
regards,
Marc
v4:
- fix division by 0, if bit rate is not set yet
v3: https://lore.kernel.org/all/20230717100815.75764-1-mkl@pengutronix.de/
- use 736 as max CAN frame length, calculated by Vincent Mailhol's
80a2fbce456e ("can: length: refactor frame lengths definition to add size in bits")
- update commit message
- drop patch 2/2
v2: https://lore.kernel.org/all/20230505222820.126441-1-marex@denx.de
- Add macros for CAN_BIT_STUFFING_OVERHEAD and CAN_IDLE_CONDITION_SAMPLES
(thanks Thomas, but please double check the comments)
- Update commit message
v1: https://lore.kernel.org/all/20230504195059.4706-1-marex@denx.de
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 10 ++++++++--
drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 1 +
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 68df6d4641b5..a1d58d09cc50 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -227,6 +227,8 @@ static int
__mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
const u8 mode_req, bool nowait)
{
+ const struct can_bittiming *bt = &priv->can.bittiming;
+ unsigned long timeout_us = MCP251XFD_POLL_TIMEOUT_US;
u32 con = 0, con_reqop, osc = 0;
u8 mode;
int err;
@@ -246,12 +248,16 @@ __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
if (mode_req == MCP251XFD_REG_CON_MODE_SLEEP || nowait)
return 0;
+ if (bt->bitrate)
+ timeout_us = max(timeout_us,
+ MCP251XFD_FRAME_LEN_MAX_BITS * USEC_PER_SEC /
+ bt->bitrate);
+
err = regmap_read_poll_timeout(priv->map_reg, MCP251XFD_REG_CON, con,
!mcp251xfd_reg_invalid(con) &&
FIELD_GET(MCP251XFD_REG_CON_OPMOD_MASK,
con) == mode_req,
- MCP251XFD_POLL_SLEEP_US,
- MCP251XFD_POLL_TIMEOUT_US);
+ MCP251XFD_POLL_SLEEP_US, timeout_us);
if (err != -ETIMEDOUT && err != -EBADMSG)
return err;
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index 7024ff0cc2c0..24510b3b8020 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -387,6 +387,7 @@ static_assert(MCP251XFD_TIMESTAMP_WORK_DELAY_SEC <
#define MCP251XFD_OSC_STAB_TIMEOUT_US (10 * MCP251XFD_OSC_STAB_SLEEP_US)
#define MCP251XFD_POLL_SLEEP_US (10)
#define MCP251XFD_POLL_TIMEOUT_US (USEC_PER_MSEC)
+#define MCP251XFD_FRAME_LEN_MAX_BITS (736)
/* Misc */
#define MCP251XFD_NAPI_WEIGHT 32
---
base-commit: 0dd1805fe498e0cf64f68e451a8baff7e64494ec
change-id: 20230717-mcp251xfd-fix-increase-poll-timeout-1f23b0e519b0
Best regards,
--
Marc Kleine-Budde <mkl(a)pengutronix.de>
Don't assume that the device is fully under the control of PCI core.
Use RMW capability accessors in link retraining which do proper locking
to avoid losing concurrent updates to the register values.
Fixes: 4ec73791a64b ("PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
pci/enumeration branch moves the link retraining code into PCI core and
also conflicts with a link retraining fix in pci/aspm. The changelog
(and patch splitting) takes the move into account by not referring to
ASPM while the change itself is not based on pci/enumeration (as per
Bjorn's preference).
---
drivers/pci/pci.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 60230da957e0..f7315b13bb82 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4927,7 +4927,6 @@ static int pcie_wait_for_link_status(struct pci_dev *pdev,
int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
{
int rc;
- u16 lnkctl;
/*
* Ensure the updated LNKCTL parameters are used during link
@@ -4939,17 +4938,14 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
if (rc)
return rc;
- pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnkctl);
- lnkctl |= PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+ pcie_capability_set_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
if (pdev->clear_retrain_link) {
/*
* Due to an erratum in some devices the Retrain Link bit
* needs to be cleared again manually to allow the link
* training to succeed.
*/
- lnkctl &= ~PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+ pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
}
return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
--
2.30.2
Many places in the kernel write the Link Control and Root Control PCI
Express Capability Registers without proper concurrency control and
this could result in losing the changes one of the writers intended to
make.
Add pcie_cap_lock spinlock into the struct pci_dev and use it to
protect bit changes made in the RMW capability accessors. Protect only
a selected set of registers by differentiating the RMW accessor
internally to locked/unlocked variants using a wrapper which has the
same signature as pcie_capability_clear_and_set_word(). As the
Capability Register (pos) given to the wrapper is always a constant,
the compiler should be able to simplify all the dead-code away.
So far only the Link Control Register (ASPM, hotplug, link retraining,
various drivers) and the Root Control Register (AER & PME) seem to
require RMW locking.
Fixes: c7f486567c1d ("PCI PM: PCIe PME root port service driver")
Fixes: f12eb72a268b ("PCI/ASPM: Use PCI Express Capability accessors")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Fixes: affa48de8417 ("staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM")
Fixes: 849a9366cba9 ("misc: rtsx: Add support new chip rts5228 mmc: rtsx: Add support MMC_CAP2_NO_MMC")
Fixes: 3d1e7aa80d1c ("misc: rtsx: Use pcie_capability_clear_and_set_word() for PCI_EXP_LNKCTL")
Fixes: c0e5f4e73a71 ("misc: rtsx: Add support for RTS5261")
Fixes: 3df4fce739e2 ("misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG")
Fixes: 121e9c6b5c4c ("misc: rtsx: modify and fix init_hw function")
Fixes: 19f3bd548f27 ("mfd: rtsx: Remove LCTLR defination")
Fixes: 773ccdfd9cc6 ("mfd: rtsx: Read vendor setting from config space")
Fixes: 8275b77a1513 ("mfd: rts5249: Add support for RTS5250S power saving")
Fixes: 5da4e04ae480 ("misc: rtsx: Add support for RTS5260")
Fixes: 0f49bfbd0f2e ("tg3: Use PCI Express Capability accessors")
Fixes: 5e7dfd0fb94a ("tg3: Prevent corruption at 10 / 100Mbps w CLKREQ")
Fixes: b726e493e8dc ("r8169: sync existing 8168 device hardware start sequences with vendor driver")
Fixes: e6de30d63eb1 ("r8169: more 8168dp support.")
Fixes: 8a06127602de ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards")
Fixes: 6f461f6c7c96 ("e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata")
Fixes: 1eae4eb2a1c7 ("e1000e: Disable L1 ASPM power savings for 82573 mobile variants")
Fixes: 8060e169e02f ("ath9k: Enable extended synch for AR9485 to fix L0s recovery issue")
Fixes: 69ce674bfa69 ("ath9k: do btcoex ASPM disabling at initialization time")
Fixes: f37f05503575 ("mt76: mt76x2e: disable pcie_aspm by default")
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/pci/access.c | 20 +++++++++++++++++---
drivers/pci/probe.c | 1 +
include/linux/pci.h | 34 ++++++++++++++++++++++++++++++++--
3 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 3c230ca3de58..0b2e90d2f04f 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -497,8 +497,8 @@ int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val)
}
EXPORT_SYMBOL(pcie_capability_write_dword);
-int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
- u16 clear, u16 set)
+int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set)
{
int ret;
u16 val;
@@ -512,7 +512,21 @@ int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
return ret;
}
-EXPORT_SYMBOL(pcie_capability_clear_and_set_word);
+EXPORT_SYMBOL(pcie_capability_clear_and_set_word_unlocked);
+
+int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&dev->pcie_cap_lock, flags);
+ ret = pcie_capability_clear_and_set_word_unlocked(dev, pos, clear, set);
+ spin_unlock_irqrestore(&dev->pcie_cap_lock, flags);
+
+ return ret;
+}
+EXPORT_SYMBOL(pcie_capability_clear_and_set_word_locked);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8bac3ce02609..f1587fb0ba71 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2324,6 +2324,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
.end = -1,
};
+ spin_lock_init(&dev->pcie_cap_lock);
#ifdef CONFIG_PCI_MSI
raw_spin_lock_init(&dev->msi_lock);
#endif
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c69a2cc1f412..7ee498cd1f37 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -467,6 +467,7 @@ struct pci_dev {
pci_dev_flags_t dev_flags;
atomic_t enable_cnt; /* pci_enable_device has been called */
+ spinlock_t pcie_cap_lock; /* Protects RMW ops in capability accessors */
u32 saved_config_space[16]; /* Config space saved at suspend time */
struct hlist_head saved_cap_space;
int rom_attr_enabled; /* Display of ROM attribute enabled? */
@@ -1217,11 +1218,40 @@ int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val);
int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val);
int pcie_capability_write_word(struct pci_dev *dev, int pos, u16 val);
int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val);
-int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
- u16 clear, u16 set);
+int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set);
+int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set);
+/**
+ * pcie_capability_clear_and_set_word - RMW accessor for PCI Express Capability Registers
+ * @dev: PCI device structure of the PCI Express device
+ * @pos: PCI Express Capability Register
+ * @clear: Clear bitmask
+ * @set: Set bitmask
+ *
+ * Perform a Read-Modify-Write (RMW) operation using @clear and @set
+ * bitmasks on PCI Express Capability Register at @pos. Certain PCI Express
+ * Capability Registers are accessed concurrently in RMW fashion, hence
+ * require locking which is handled transparently to the caller.
+ */
+static inline int pcie_capability_clear_and_set_word(struct pci_dev *dev,
+ int pos,
+ u16 clear, u16 set)
+{
+ switch (pos) {
+ case PCI_EXP_LNKCTL:
+ case PCI_EXP_RTCTL:
+ return pcie_capability_clear_and_set_word_locked(dev, pos,
+ clear, set);
+ default:
+ return pcie_capability_clear_and_set_word_unlocked(dev, pos,
+ clear, set);
+ }
+}
+
static inline int pcie_capability_set_word(struct pci_dev *dev, int pos,
u16 set)
{
--
2.30.2