This device has a silicon bug that makes it report a timeout interrupt but no data in the FIFO.
The datasheet states the following in the errata section 18.1.4:
"If the host reads the receive FIFO at the same time as a time-out interrupt condition happens, the host might read 0xCC (time-out) in the Interrupt Indication Register (IIR), but bit 0 of the Line Status Register (LSR) is not set (means there is no data in the receive FIFO)."
The errata doesn't explicitly mention that, but tests have shown and the vendor has confirmed that the RXLVL register is equally affected.
This bug has hit us on production units and when it does, sc16is7xx_irq() would spin forever because sc16is7xx_port_irq() keeps seeing an interrupt in the IIR register that is not cleared because the driver does not call into sc16is7xx_handle_rx() unless the RXLVL register reports at least one byte in the FIFO.
Fix this by always reading one byte when this condition is detected in order to clear the interrupt. This approach was confirmed to be correct by NXP through their support channels.
Signed-off-by: Daniel Mack daniel@zonque.org Co-Developed-by: Maxim Popov maxim.snafu@gmail.com Cc: stable@vger.kernel.org --- Meanwhile, NXP has confirmed this fix to be correct.
v4: NXP has confirmed the fix; update the commit log accordingly v3: re-added the additional Co-Developed-by and stable@ tags v2: reworded the commit log a bit for more context.
drivers/tty/serial/sc16is7xx.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 289ca7d4e566..76f76e510ed1 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -765,6 +765,18 @@ static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno) case SC16IS7XX_IIR_RTOI_SRC: case SC16IS7XX_IIR_XOFFI_SRC: rxlen = sc16is7xx_port_read(port, SC16IS7XX_RXLVL_REG); + + /* + * There is a silicon bug that makes the chip report a + * time-out interrupt but no data in the FIFO. This is + * described in errata section 18.1.4. + * + * When this happens, read one byte from the FIFO to + * clear the interrupt. + */ + if (iir == SC16IS7XX_IIR_RTOI_SRC && !rxlen) + rxlen = 1; + if (rxlen) sc16is7xx_handle_rx(port, rxlen, iir); break;
On Wed, 22 Nov 2023 08:35:41 +0100 Daniel Mack daniel@zonque.org wrote:
This device has a silicon bug that makes it report a timeout interrupt but no data in the FIFO.
The datasheet states the following in the errata section 18.1.4:
"If the host reads the receive FIFO at the same time as a time-out interrupt condition happens, the host might read 0xCC (time-out) in the Interrupt Indication Register (IIR), but bit 0 of the Line Status Register (LSR) is not set (means there is no data in the receive FIFO)."
The errata doesn't explicitly mention that, but tests have shown and the vendor has confirmed that the RXLVL register is equally affected.
Hi Daniel, thank you for the feedback from NXP.
I would suggest to replace this paragraph with something like this:
------ The errata description seems to indicate it affects only polled mode of operation when reading bit 0 of the LSR register. But when using interrupt mode (IRQ) like this driver does, reading RXLVL gives a value of zero even if there is data in the Rx FIFO (confirmed by tests and NXP). ------
This bug has hit us on production units and when it does, sc16is7xx_irq() would spin forever because sc16is7xx_port_irq() keeps seeing an interrupt in the IIR register that is not cleared because the driver does not call into sc16is7xx_handle_rx() unless the RXLVL register reports at least one byte in the FIFO.
Fix this by always reading one byte when this condition is detected
Change "reading one byte" to "reading one byte from the Rx FIFO".
in order to clear the interrupt. This approach was confirmed to be correct by NXP through their support channels.
Signed-off-by: Daniel Mack daniel@zonque.org Co-Developed-by: Maxim Popov maxim.snafu@gmail.com Cc: stable@vger.kernel.org
I tested your patch for the last few days, and I was not able to reproduce the problem (I put a trace to detect the condition). But at the same time, it has not caused any regressions.
With the above changes, feel free to add:
Tested by: Hugo Villeneuve hvilleneuve@dimonoff.com
Hugo.
Meanwhile, NXP has confirmed this fix to be correct.
v4: NXP has confirmed the fix; update the commit log accordingly v3: re-added the additional Co-Developed-by and stable@ tags v2: reworded the commit log a bit for more context.
drivers/tty/serial/sc16is7xx.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 289ca7d4e566..76f76e510ed1 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -765,6 +765,18 @@ static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno) case SC16IS7XX_IIR_RTOI_SRC: case SC16IS7XX_IIR_XOFFI_SRC: rxlen = sc16is7xx_port_read(port, SC16IS7XX_RXLVL_REG);
/*
* There is a silicon bug that makes the chip report a
* time-out interrupt but no data in the FIFO. This is
* described in errata section 18.1.4.
*
* When this happens, read one byte from the FIFO to
* clear the interrupt.
*/
if (iir == SC16IS7XX_IIR_RTOI_SRC && !rxlen)
rxlen = 1;
if (rxlen) sc16is7xx_handle_rx(port, rxlen, iir); break;
-- 2.41.0
linux-stable-mirror@lists.linaro.org