Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again") makes a change to FIFO clearing code which its commit message suggests was intended to be specific to use with RS485 mode, however:
1) The change made does not just affect __do_stop_tx_rs485(), it also affects other uses of serial8250_clear_fifos() including paths for starting up, shutting down or auto-configuring a port regardless of whether it's an RS485 port or not.
2) It makes the assumption that resetting the FIFOs is a no-op when FIFOs are disabled, and as such it checks for this case & explicitly avoids setting the FIFO reset bits when the FIFO enable bit is clear. A reading of the PC16550D manual would suggest that this is OK since the FIFO should automatically be reset if it is later enabled, but we support many 16550-compatible devices and have never required this auto-reset behaviour for at least the whole git era. Starting to rely on it now seems risky, offers no benefit, and indeed breaks at least the Ingenic JZ4780's UARTs which reads garbage when the RX FIFO is enabled if we don't explicitly reset it.
3) By only resetting the FIFOs if they're enabled, the behaviour of serial8250_do_startup() during boot now depends on what the value of FCR is before the 8250 driver is probed. This in itself seems questionable and leaves us with FCR=0 & no FIFO reset if the UART was used by 8250_early, otherwise it depends upon what the bootloader left behind.
4) Although the naming of serial8250_clear_fifos() may be unclear, it is clear that callers of it expect that it will disable FIFOs. Both serial8250_do_startup() & serial8250_do_shutdown() contain comments to that effect, and other callers explicitly re-enable the FIFOs after calling serial8250_clear_fifos(). The premise of that patch that disabling the FIFOs is incorrect therefore seems wrong.
For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again").
Signed-off-by: Paul Burton paul.burton@mips.com Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again"). Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Jedrychowski avistel@gmail.com Cc: Marek Vasut marex@denx.de Cc: linux-mips@vger.kernel.org Cc: linux-serial@vger.kernel.org Cc: stable stable@vger.kernel.org # 4.10+ --- I did suggest an alternative approach which would rename serial8250_clear_fifos() and split it into 2 variants - one that disables FIFOs & one that does not, then use the latter in __do_stop_tx_rs485():
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
However I have no access to the OMAP3 hardware that Marek's patch was attempting to fix & have heard nothing back with regards to him testing that approach, so here's a simple revert that fixes the Ingenic JZ4780.
I've marked for stable back to v4.10 presuming that this is how far the broken patch may be backported, given that this is where commit 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break") that it tried to fix was introduced. --- drivers/tty/serial/8250/8250_port.c | 29 +++++------------------------ 1 file changed, 5 insertions(+), 24 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index f776b3eafb96..3f779d25ec0c 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -552,30 +552,11 @@ static unsigned int serial_icr_read(struct uart_8250_port *up, int offset) */ static void serial8250_clear_fifos(struct uart_8250_port *p) { - unsigned char fcr; - unsigned char clr_mask = UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT; - if (p->capabilities & UART_CAP_FIFO) { - /* - * Make sure to avoid changing FCR[7:3] and ENABLE_FIFO bits. - * In case ENABLE_FIFO is not set, there is nothing to flush - * so just return. Furthermore, on certain implementations of - * the 8250 core, the FCR[7:3] bits may only be changed under - * specific conditions and changing them if those conditions - * are not met can have nasty side effects. One such core is - * the 8250-omap present in TI AM335x. - */ - fcr = serial_in(p, UART_FCR); - - /* FIFO is not enabled, there's nothing to clear. */ - if (!(fcr & UART_FCR_ENABLE_FIFO)) - return; - - fcr |= clr_mask; - serial_out(p, UART_FCR, fcr); - - fcr &= ~clr_mask; - serial_out(p, UART_FCR, fcr); + serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO); + serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO | + UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT); + serial_out(p, UART_FCR, 0); } }
@@ -1467,7 +1448,7 @@ static void __do_stop_tx_rs485(struct uart_8250_port *p) * Enable previously disabled RX interrupts. */ if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) { - serial8250_clear_fifos(p); + serial8250_clear_and_reinit_fifos(p);
p->ier |= UART_IER_RLSI | UART_IER_RDI; serial_port_out(&p->port, UART_IER, p->ier);
On 12/16/2018 09:10 PM, Paul Burton wrote:
Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again") makes a change to FIFO clearing code which its commit message suggests was intended to be specific to use with RS485 mode, however:
The change made does not just affect __do_stop_tx_rs485(), it also affects other uses of serial8250_clear_fifos() including paths for starting up, shutting down or auto-configuring a port regardless of whether it's an RS485 port or not.
It makes the assumption that resetting the FIFOs is a no-op when FIFOs are disabled, and as such it checks for this case & explicitly avoids setting the FIFO reset bits when the FIFO enable bit is clear. A reading of the PC16550D manual would suggest that this is OK since the FIFO should automatically be reset if it is later enabled, but we support many 16550-compatible devices and have never required this auto-reset behaviour for at least the whole git era. Starting to rely on it now seems risky, offers no benefit, and indeed breaks at least the Ingenic JZ4780's UARTs which reads garbage when the RX FIFO is enabled if we don't explicitly reset it.
By only resetting the FIFOs if they're enabled, the behaviour of serial8250_do_startup() during boot now depends on what the value of FCR is before the 8250 driver is probed. This in itself seems questionable and leaves us with FCR=0 & no FIFO reset if the UART was used by 8250_early, otherwise it depends upon what the bootloader left behind.
Although the naming of serial8250_clear_fifos() may be unclear, it is clear that callers of it expect that it will disable FIFOs. Both serial8250_do_startup() & serial8250_do_shutdown() contain comments to that effect, and other callers explicitly re-enable the FIFOs after calling serial8250_clear_fifos(). The premise of that patch that disabling the FIFOs is incorrect therefore seems wrong.
For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again").
Signed-off-by: Paul Burton paul.burton@mips.com Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again"). Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Jedrychowski avistel@gmail.com Cc: Marek Vasut marex@denx.de Cc: linux-mips@vger.kernel.org Cc: linux-serial@vger.kernel.org Cc: stable stable@vger.kernel.org # 4.10+
I am unable to test it on such a short notice as I'm currently ill, so I cannot tell if your change breaks the OMAP3/AM335x boards or not. Given that there are very few CI20 boards in use, I'd like to ask you for some extra time to investigate this on the OMAP3 too.
btw what strikes me as curious is that this patch emerged shortly after Ezequiel re-posted the CI20 U-Boot patches after an year-long hiatus, is it somehow related ?
Hi Marek,
On Sun, Dec 16, 2018 at 09:32:19PM +0100, Marek Vasut wrote:
I am unable to test it on such a short notice as I'm currently ill, so I cannot tell if your change breaks the OMAP3/AM335x boards or not. Given that there are very few CI20 boards in use, I'd like to ask you for some extra time to investigate this on the OMAP3 too.
I'm sorry to hear that you're ill, but your patch is getting awfully close to becoming part of a stable kernel release & it causes regressions. Even if it didn't break a board I use, I think the patch would be broken & risky for the reasons I outlined in my revert's commit message.
Ultimately it's Greg's decision but it sounds like you're asking me to say it's OK to break the JZ4780 in a stable kernel with a patch that I think would be risky anyway, and I won't do that.
btw what strikes me as curious is that this patch emerged shortly after Ezequiel re-posted the CI20 U-Boot patches after an year-long hiatus, is it somehow related ?
Not at all - I regularly test on Ci20 & found this breakage whilst testing Paul Cercueil's Ingenic TCU patchset v8 [1]. Using a Ci20 with mainline kernels doesn't rely on Ezequiel's U-Boot work, and indeed I generally boot using my ci20-usb-boot tool [2] so U-Boot isn't involved at all.
Now my Ci20 testing isn't automated so it tends to happen mostly when there are obvious changes to the board or SoC support, but it should become more automated soon. Kevin Hilman just got a Ci40 working with kernelci.org infrastructure & I hope we can get Ci20 & other boards included soon too, either via existing kernelci.org labs or by setting up a new one.
Thanks, Paul
[1] https://lore.kernel.org/linux-mips/20181212220922.18759-1-paul@crapouillou.n... [2] https://github.com/paulburton/ci20-tools
On 12/16/2018 10:31 PM, Paul Burton wrote:
Hi Marek,
Hi,
On Sun, Dec 16, 2018 at 09:32:19PM +0100, Marek Vasut wrote:
I am unable to test it on such a short notice as I'm currently ill, so I cannot tell if your change breaks the OMAP3/AM335x boards or not. Given that there are very few CI20 boards in use, I'd like to ask you for some extra time to investigate this on the OMAP3 too.
I'm sorry to hear that you're ill, but your patch is getting awfully close to becoming part of a stable kernel release & it causes regressions. Even if it didn't break a board I use, I think the patch would be broken & risky for the reasons I outlined in my revert's commit message.
That's what the incremental releases are for, so that minor problems can get fixed there. Sure, it's great to have things perfect in the first release, but if that breaks other systems, too bad.
Ultimately it's Greg's decision but it sounds like you're asking me to say it's OK to break the JZ4780 in a stable kernel with a patch that I think would be risky anyway, and I won't do that.
I am saying this revert breaks AM335x, so this is a stalemate. I had a discussion with Ezequiel (on CC) and he seems to have a different smaller patch coming for this problem.
[...]
On Sun, 16 Dec 2018 at 18:45, Marek Vasut marex@denx.de wrote: [skips discussion]
Ultimately it's Greg's decision but it sounds like you're asking me to say it's OK to break the JZ4780 in a stable kernel with a patch that I think would be risky anyway, and I won't do that.
I am saying this revert breaks AM335x, so this is a stalemate. I had a discussion with Ezequiel (on CC) and he seems to have a different smaller patch coming for this problem.
Can you guys test this? Note that serial8250_do_startup has a comment stating clearly that it has the intention of disabling the FIFOs, so it seems this is the right thing to do.
Paul, this removes the garbage on my CI20 (rev.1)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index c39482b96111..fac19cbc51d1 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -2209,10 +2209,11 @@ int serial8250_do_startup(struct uart_port *port) /* * Clear the FIFO buffers and disable them. * (they will be reenabled in set_termios()) */ serial8250_clear_fifos(up); + serial_out(up, UART_FCR, 0);
/* * Clear the interrupt registers. */ serial_port_in(port, UART_LSR);
On 12/16/2018 10:52 PM, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 18:45, Marek Vasut marex@denx.de wrote: [skips discussion]
Ultimately it's Greg's decision but it sounds like you're asking me to say it's OK to break the JZ4780 in a stable kernel with a patch that I think would be risky anyway, and I won't do that.
I am saying this revert breaks AM335x, so this is a stalemate. I had a discussion with Ezequiel (on CC) and he seems to have a different smaller patch coming for this problem.
Can you guys test this? Note that serial8250_do_startup has a comment stating clearly that it has the intention of disabling the FIFOs, so it seems this is the right thing to do.
Paul, this removes the garbage on my CI20 (rev.1)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index c39482b96111..fac19cbc51d1 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -2209,10 +2209,11 @@ int serial8250_do_startup(struct uart_port *port) /* * Clear the FIFO buffers and disable them. * (they will be reenabled in set_termios()) */ serial8250_clear_fifos(up);
serial_out(up, UART_FCR, 0); /* * Clear the interrupt registers. */ serial_port_in(port, UART_LSR);
On AM335x pocketbeagle Tested-by: Marek Vasut marex@denx.de
Hi Ezequiel,
On Sun, Dec 16, 2018 at 06:52:53PM -0300, Ezequiel Garcia wrote:
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index c39482b96111..fac19cbc51d1 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -2209,10 +2209,11 @@ int serial8250_do_startup(struct uart_port *port) /* * Clear the FIFO buffers and disable them. * (they will be reenabled in set_termios()) */ serial8250_clear_fifos(up);
serial_out(up, UART_FCR, 0); /* * Clear the interrupt registers. */ serial_port_in(port, UART_LSR);
This helps, but it only addresses one part of one of the 4 reasons I listed as motivation for my revert. For example serial8250_do_shutdown() also clearly intends to disable the FIFOs.
Thanks, Paul
On Sun, 16 Dec 2018 at 19:24, Paul Burton paul.burton@mips.com wrote:
Hi Ezequiel,
On Sun, Dec 16, 2018 at 06:52:53PM -0300, Ezequiel Garcia wrote:
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index c39482b96111..fac19cbc51d1 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -2209,10 +2209,11 @@ int serial8250_do_startup(struct uart_port *port) /* * Clear the FIFO buffers and disable them. * (they will be reenabled in set_termios()) */ serial8250_clear_fifos(up);
serial_out(up, UART_FCR, 0); /* * Clear the interrupt registers. */ serial_port_in(port, UART_LSR);
This helps, but it only addresses one part of one of the 4 reasons I listed as motivation for my revert. For example serial8250_do_shutdown() also clearly intends to disable the FIFOs.
OK. So, let's fix that :-)
By all means, it would be really nice to push forward and fix the garbage issue on JZ4780, as well as the transmission issue on AM335x.
AM335x is a wildly popular platform, and it's not funny to break it. So, let's please stop discussing which board we'll break and just fix both.
Hi Ezequiel,
On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 19:24, Paul Burton paul.burton@mips.com wrote:
This helps, but it only addresses one part of one of the 4 reasons I listed as motivation for my revert. For example serial8250_do_shutdown() also clearly intends to disable the FIFOs.
OK. So, let's fix that :-)
I already did, or at least tried to, on Thursday [1].
By all means, it would be really nice to push forward and fix the garbage issue on JZ4780, as well as the transmission issue on AM335x.
AM335x is a wildly popular platform, and it's not funny to break it.
Well, clearly not if it was broken in v4.10 & only just fixed..? And from Marek's commit message the patch in v4.10 doesn't break the whole system just RS485.
So, let's please stop discussing which board we'll break and just fix both.
I completely agree that would be ideal and I wrote a patch hoping to do that on Thursday, but didn't get any response on testing. It's late in the cycle hence a revert made sense. Simple as that.
Thanks, Paul
[1] https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
On Sun, Dec 16, 2018 at 10:35:12PM +0000, Paul Burton wrote:
Hi Ezequiel,
On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 19:24, Paul Burton paul.burton@mips.com wrote:
This helps, but it only addresses one part of one of the 4 reasons I listed as motivation for my revert. For example serial8250_do_shutdown() also clearly intends to disable the FIFOs.
OK. So, let's fix that :-)
I already did, or at least tried to, on Thursday [1].
By all means, it would be really nice to push forward and fix the garbage issue on JZ4780, as well as the transmission issue on AM335x.
AM335x is a wildly popular platform, and it's not funny to break it.
Well, clearly not if it was broken in v4.10 & only just fixed..? And from Marek's commit message the patch in v4.10 doesn't break the whole system just RS485.
So, let's please stop discussing which board we'll break and just fix both.
I completely agree that would be ideal and I wrote a patch hoping to do that on Thursday, but didn't get any response on testing. It's late in the cycle hence a revert made sense. Simple as that.
A revert makes sense now, I'll go queue this up, thanks.
greg k-h
On 12/17/2018 04:18 PM, Greg Kroah-Hartman wrote:
On Sun, Dec 16, 2018 at 10:35:12PM +0000, Paul Burton wrote:
Hi Ezequiel,
On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 19:24, Paul Burton paul.burton@mips.com wrote:
This helps, but it only addresses one part of one of the 4 reasons I listed as motivation for my revert. For example serial8250_do_shutdown() also clearly intends to disable the FIFOs.
OK. So, let's fix that :-)
I already did, or at least tried to, on Thursday [1].
By all means, it would be really nice to push forward and fix the garbage issue on JZ4780, as well as the transmission issue on AM335x.
AM335x is a wildly popular platform, and it's not funny to break it.
Well, clearly not if it was broken in v4.10 & only just fixed..? And from Marek's commit message the patch in v4.10 doesn't break the whole system just RS485.
So, let's please stop discussing which board we'll break and just fix both.
I completely agree that would be ideal and I wrote a patch hoping to do that on Thursday, but didn't get any response on testing. It's late in the cycle hence a revert made sense. Simple as that.
A revert makes sense now, I'll go queue this up, thanks.
I don't like this for multiple reasons. 1) There is a better patch posted which doesn't break the AM335x and clearly identifies and fixes the problem on the JZ4780 / CI20 2) The JZ4780 8250 core is not a standard 8250 core, since it has extra bits in the FCR register (like the UME bit, which disables the whole UART block), so the revert IMO would break that core too, it just hides the breakage. I'm still trying to understand the implications of that in detail, but the discussion wasn't quite constructive.
I'd much rather see Ezequiel's patch applied, since that's far less destructive approach to fixing the problem than the revert.
On Sun, 16 Dec 2018 at 19:35, Paul Burton paul.burton@mips.com wrote:
Hi Ezequiel,
On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 19:24, Paul Burton paul.burton@mips.com wrote:
This helps, but it only addresses one part of one of the 4 reasons I listed as motivation for my revert. For example serial8250_do_shutdown() also clearly intends to disable the FIFOs.
OK. So, let's fix that :-)
I already did, or at least tried to, on Thursday [1].
By all means, it would be really nice to push forward and fix the garbage issue on JZ4780, as well as the transmission issue on AM335x.
AM335x is a wildly popular platform, and it's not funny to break it.
Well, clearly not if it was broken in v4.10 & only just fixed..? And from Marek's commit message the patch in v4.10 doesn't break the whole system just RS485.
Careful here. It's naive to consider v4.10 as old in this context.
AM335x is used in hundreds of thousands of products, probably "industrial" products. Manufacturers don't always follow the kernel, and it's entirely likely that they notice a regression only when developing a new product, or when rebasing on top of a newer longterm kernel.
Then again, I don't have any details about what is Marek's original fix actually fixing.
Marek: could you please post the test case that you used to validate your fix? I can't find that anywhere.
So, let's please stop discussing which board we'll break and just fix both.
I completely agree that would be ideal and I wrote a patch hoping to do that on Thursday, but didn't get any response on testing. It's late in the cycle hence a revert made sense. Simple as that.
Thanks, Paul
[1] https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
Well, I think this patch a lot of sense. But since Greg wants to revert Marek's fix, it would make sense to collate both Marek's and Paul's in a single commit.
On 12/17/2018 05:30 PM, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 19:35, Paul Burton paul.burton@mips.com wrote:
Hi Ezequiel,
On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 19:24, Paul Burton paul.burton@mips.com wrote:
This helps, but it only addresses one part of one of the 4 reasons I listed as motivation for my revert. For example serial8250_do_shutdown() also clearly intends to disable the FIFOs.
OK. So, let's fix that :-)
I already did, or at least tried to, on Thursday [1].
By all means, it would be really nice to push forward and fix the garbage issue on JZ4780, as well as the transmission issue on AM335x.
AM335x is a wildly popular platform, and it's not funny to break it.
Well, clearly not if it was broken in v4.10 & only just fixed..? And from Marek's commit message the patch in v4.10 doesn't break the whole system just RS485.
Careful here. It's naive to consider v4.10 as old in this context.
AM335x is used in hundreds of thousands of products, probably "industrial" products. Manufacturers don't always follow the kernel, and it's entirely likely that they notice a regression only when developing a new product, or when rebasing on top of a newer longterm kernel.
Then again, I don't have any details about what is Marek's original fix actually fixing.
Marek: could you please post the test case that you used to validate your fix? I can't find that anywhere.
I can't share the testcase itself because it has no license and I didn't write it, but I can tell you what it's doing. (I'll check if I can share the testcase verbatim too, I just sent an email to the author)
The test spawns two threads, one sending , one receiving. The sending thread sends 8 bytes of data from /dev/ttyS4 , the receiving thread receives the data on /dev/ttyS2 and compares the pattern. The port settings is B1000000 8N1 with rs485conf.flags set to SER_RS485_ENABLED and SER_RS485_RTS_AFTER_SEND. Sometimes the received data do not match the sent data, but rather look as if one character was left over from the previous transmission in the FIFO.
Those two UARTs are connected together by two wires, without any real RS485 hardware to minimize the hardware complexity (it's easy to implement that on the pocketbeagle, which is the cheap option here).
On 12/17/2018 06:20 PM, Marek Vasut wrote:
On 12/17/2018 05:30 PM, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 19:35, Paul Burton paul.burton@mips.com wrote:
Hi Ezequiel,
On Sun, Dec 16, 2018 at 07:28:22PM -0300, Ezequiel Garcia wrote:
On Sun, 16 Dec 2018 at 19:24, Paul Burton paul.burton@mips.com wrote:
This helps, but it only addresses one part of one of the 4 reasons I listed as motivation for my revert. For example serial8250_do_shutdown() also clearly intends to disable the FIFOs.
OK. So, let's fix that :-)
I already did, or at least tried to, on Thursday [1].
By all means, it would be really nice to push forward and fix the garbage issue on JZ4780, as well as the transmission issue on AM335x.
AM335x is a wildly popular platform, and it's not funny to break it.
Well, clearly not if it was broken in v4.10 & only just fixed..? And from Marek's commit message the patch in v4.10 doesn't break the whole system just RS485.
Careful here. It's naive to consider v4.10 as old in this context.
AM335x is used in hundreds of thousands of products, probably "industrial" products. Manufacturers don't always follow the kernel, and it's entirely likely that they notice a regression only when developing a new product, or when rebasing on top of a newer longterm kernel.
Then again, I don't have any details about what is Marek's original fix actually fixing.
Marek: could you please post the test case that you used to validate your fix? I can't find that anywhere.
I can't share the testcase itself because it has no license and I didn't write it, but I can tell you what it's doing. (I'll check if I can share the testcase verbatim too, I just sent an email to the author)
The test spawns two threads, one sending , one receiving. The sending thread sends 8 bytes of data from /dev/ttyS4 , the receiving thread receives the data on /dev/ttyS2 and compares the pattern. The port settings is B1000000 8N1 with rs485conf.flags set to SER_RS485_ENABLED and SER_RS485_RTS_AFTER_SEND. Sometimes the received data do not match the sent data, but rather look as if one character was left over from the previous transmission in the FIFO.
Those two UARTs are connected together by two wires, without any real RS485 hardware to minimize the hardware complexity (it's easy to implement that on the pocketbeagle, which is the cheap option here).
Test code is below . On the pocketbeagle, connect SPI0:CLK with U4:TX and SPI0:MISO with U4:RX , apply the DT patch below and run the example. It'll fail after a few iterations.
#include <errno.h> #include <fcntl.h> #include <string.h> #include <termios.h> #include <unistd.h>
#include <iostream> #include <iomanip> #include <atomic> #include <thread>
#include <linux/serial.h> #include <sys/ioctl.h>
std::atomic<bool> running{true};
int set_interface_attribs (int fd, int speed, int parity) { struct termios tty; memset (&tty, 0, sizeof tty); if (tcgetattr (fd, &tty) != 0) { std::cerr << "Error from tcgetattr" << std::endl; return -1; }
cfsetospeed (&tty, speed); cfsetispeed (&tty, speed);
tty.c_cflag = (tty.c_cflag & ~CSIZE) | CS8; tty.c_iflag &= ~IGNBRK; tty.c_lflag = 0;
tty.c_oflag = 0; tty.c_cc[VMIN] = 8; tty.c_cc[VTIME] = 0;
tty.c_iflag &= ~(IXON | IXOFF | IXANY);
tty.c_cflag |= (CLOCAL | CREAD); tty.c_cflag &= ~(PARENB | PARODD); tty.c_cflag |= parity; tty.c_cflag &= ~CSTOPB; tty.c_cflag &= ~CRTSCTS;
if (tcsetattr (fd, TCSANOW, &tty) != 0) { std::cerr << "Error from tcsetattr" << std::endl; return -1; } return 0; }
void enable_rts(int fd, int rts) { struct serial_rs485 rs485conf; if (rts) { rs485conf.flags = SER_RS485_ENABLED ; rs485conf.flags |= SER_RS485_RTS_AFTER_SEND; rs485conf.flags &= ~(SER_RS485_RTS_ON_SEND); rs485conf.delay_rts_before_send = 0; rs485conf.delay_rts_after_send = 0; } else { rs485conf.flags = 0 ; }
if (ioctl( fd, TIOCSRS485, &rs485conf) < 0){ std::cerr << "Cannot set device in RS485 mode" << std::endl; } }
void output_function(int fd) { while(running) { write (fd, "asdfghjk", 8); usleep (20000); } }
void input_function(int fd) { char buf [100]; size_t count=0; std::cout << std::unitbuf; while(running){ int n = read (fd, buf, sizeof buf); if( (n >0) && (buf[0] != 'a') ){ std::cout << "\nunexpected receive! " << std::string(buf,8) << std::endl; running = false; } else { ++count; if(count%100 == 0){ std::cout << "\r" << std::setw(8) << count << " possibly ok"; } } } }
int main(int argc, char** argv) { std::string in_port = "/dev/ttyS2"; std::string out_port = "/dev/ttyS4";
int c, rts = 1;
while ((c = getopt (argc, argv, "i:o:r")) != -1) { switch (c) { case 'i': in_port = std::string(optarg); break; case 'o': out_port = std::string(optarg); break; case 'r': rts = 0; break; case '?': return 1; default: break; } }
std::cout << "opening output " << out_port << std::endl; int outfd = open ( out_port.c_str(), O_RDWR | O_NOCTTY | O_SYNC); if (outfd < 0) { std::cerr << "Could not open " << out_port << std::endl; return 1; }
std::cout << "opening input " << in_port << std::endl; int infd = open ( in_port.c_str(), O_RDWR | O_NOCTTY | O_SYNC); if (infd < 0) { std::cerr << "Could not open " << in_port << std::endl; return 1; }
set_interface_attribs (infd, B1000000, 0); set_interface_attribs (outfd, B1000000, 0);
enable_rts(outfd, rts); enable_rts(infd, rts);
std::thread in(input_function, infd); std::thread out(output_function, outfd);
in.join(); out.join();
close(infd); close(outfd); }
diff --git a/arch/arm/boot/dts/am335x-pocketbeagle.dts b/arch/arm/boot/dts/am335x-pocketbeagle.dts index 62fe5cab9fae..d9b8f09920a6 100644 --- a/arch/arm/boot/dts/am335x-pocketbeagle.dts +++ b/arch/arm/boot/dts/am335x-pocketbeagle.dts @@ -92,15 +92,6 @@ >; };
- spi0_pins: pinmux-spi0-pins { - pinctrl-single,pins = < - AM33XX_IOPAD(0x950, PIN_INPUT_PULLUP | MUX_MODE0) /* (A17) spi0_sclk.spi0_sclk */ - AM33XX_IOPAD(0x954, PIN_INPUT_PULLUP | MUX_MODE0) /* (B17) spi0_d0.spi0_d0 */ - AM33XX_IOPAD(0x958, PIN_INPUT_PULLUP | MUX_MODE0) /* (B16) spi0_d1.spi0_d1 */ - AM33XX_IOPAD(0x95c, PIN_INPUT_PULLUP | MUX_MODE0) /* (A16) spi0_cs0.spi0_cs0 */ - >; - }; - spi1_pins: pinmux-spi1-pins { pinctrl-single,pins = < AM33XX_IOPAD(0x964, PIN_INPUT_PULLUP | MUX_MODE4) /* (C18) eCAP0_in_PWM0_out.spi1_sclk */ @@ -126,6 +117,13 @@ >; };
+ uart2_pins: pinmux-uart2-pins { + pinctrl-single,pins = < + AM33XX_IOPAD(0x950, PIN_INPUT | MUX_MODE1) /* spi0_sclk.uart2_rxd */ + AM33XX_IOPAD(0x954, PIN_OUTPUT | MUX_MODE1) /* spi0_d0.uart2_txd */ + >; + }; + uart4_pins: pinmux-uart4-pins { pinctrl-single,pins = < AM33XX_IOPAD(0x870, PIN_INPUT_PULLUP | MUX_MODE6) /* (T17) gpmc_wait0.uart4_rxd */ @@ -199,6 +197,13 @@ status = "okay"; };
+&uart2 { + pinctrl-names = "default"; + pinctrl-0 = <&uart2_pins>; + + status = "okay"; +}; + &uart4 { pinctrl-names = "default"; pinctrl-0 = <&uart4_pins>;
On Sun, Dec 16, 2018 at 10:45:23PM +0100, Marek Vasut wrote:
I am unable to test it on such a short notice as I'm currently ill, so I cannot tell if your change breaks the OMAP3/AM335x boards or not. Given that there are very few CI20 boards in use, I'd like to ask you for some extra time to investigate this on the OMAP3 too.
I'm sorry to hear that you're ill, but your patch is getting awfully close to becoming part of a stable kernel release & it causes regressions. Even if it didn't break a board I use, I think the patch would be broken & risky for the reasons I outlined in my revert's commit message.
That's what the incremental releases are for, so that minor problems can get fixed there. Sure, it's great to have things perfect in the first release, but if that breaks other systems, too bad.
I don't think the purpose of stable kernels is to intentionally break systems in a stable release & backport fixes later...
Ultimately it's Greg's decision but it sounds like you're asking me to say it's OK to break the JZ4780 in a stable kernel with a patch that I think would be risky anyway, and I won't do that.
I am saying this revert breaks AM335x, so this is a stalemate. I had a discussion with Ezequiel (on CC) and he seems to have a different smaller patch coming for this problem.
Well, no. Your patch says commit 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break") broke RS485 on AM33x in v4.10, and it took 10 release cycles for someone to notice & fix it. You're asking to break a system that is used & working in order to fix a problem that seemingly wasn't even noticed for nearly 2 years.
Your fix breaks my system, but I outlined 4 reasons that I believe your patch to be wrong anyway - breaking my system is just one part of that.
I'll rely to Ezequiel's patch now, but it again addresses just one part of one of the 4 points I listed in the reasoning for my revert.
Thanks, Paul
On 12/16/2018 09:10 PM, Paul Burton wrote:
Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again") makes a change to FIFO clearing code which its commit message suggests was intended to be specific to use with RS485 mode, however:
The change made does not just affect __do_stop_tx_rs485(), it also affects other uses of serial8250_clear_fifos() including paths for starting up, shutting down or auto-configuring a port regardless of whether it's an RS485 port or not.
It makes the assumption that resetting the FIFOs is a no-op when FIFOs are disabled, and as such it checks for this case & explicitly avoids setting the FIFO reset bits when the FIFO enable bit is clear. A reading of the PC16550D manual would suggest that this is OK since the FIFO should automatically be reset if it is later enabled, but we support many 16550-compatible devices and have never required this auto-reset behaviour for at least the whole git era. Starting to rely on it now seems risky, offers no benefit, and indeed breaks at least the Ingenic JZ4780's UARTs which reads garbage when the RX FIFO is enabled if we don't explicitly reset it.
By only resetting the FIFOs if they're enabled, the behaviour of serial8250_do_startup() during boot now depends on what the value of FCR is before the 8250 driver is probed. This in itself seems questionable and leaves us with FCR=0 & no FIFO reset if the UART was used by 8250_early, otherwise it depends upon what the bootloader left behind.
Although the naming of serial8250_clear_fifos() may be unclear, it is clear that callers of it expect that it will disable FIFOs. Both serial8250_do_startup() & serial8250_do_shutdown() contain comments to that effect, and other callers explicitly re-enable the FIFOs after calling serial8250_clear_fifos(). The premise of that patch that disabling the FIFOs is incorrect therefore seems wrong.
For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again").
Signed-off-by: Paul Burton paul.burton@mips.com Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again"). Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Jedrychowski avistel@gmail.com Cc: Marek Vasut marex@denx.de Cc: linux-mips@vger.kernel.org Cc: linux-serial@vger.kernel.org Cc: stable stable@vger.kernel.org # 4.10+
I did suggest an alternative approach which would rename serial8250_clear_fifos() and split it into 2 variants - one that disables FIFOs & one that does not, then use the latter in __do_stop_tx_rs485():
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
However I have no access to the OMAP3 hardware that Marek's patch was attempting to fix & have heard nothing back with regards to him testing that approach, so here's a simple revert that fixes the Ingenic JZ4780.
I've marked for stable back to v4.10 presuming that this is how far the broken patch may be backported, given that this is where commit 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break") that it tried to fix was introduced.
OK, I tested this on AM335x / OMAP3 and the system is again broken, so that's a NAK.
drivers/tty/serial/8250/8250_port.c | 29 +++++------------------------ 1 file changed, 5 insertions(+), 24 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index f776b3eafb96..3f779d25ec0c 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -552,30 +552,11 @@ static unsigned int serial_icr_read(struct uart_8250_port *up, int offset) */ static void serial8250_clear_fifos(struct uart_8250_port *p) {
- unsigned char fcr;
- unsigned char clr_mask = UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT;
- if (p->capabilities & UART_CAP_FIFO) {
/*
* Make sure to avoid changing FCR[7:3] and ENABLE_FIFO bits.
* In case ENABLE_FIFO is not set, there is nothing to flush
* so just return. Furthermore, on certain implementations of
* the 8250 core, the FCR[7:3] bits may only be changed under
* specific conditions and changing them if those conditions
* are not met can have nasty side effects. One such core is
* the 8250-omap present in TI AM335x.
*/
fcr = serial_in(p, UART_FCR);
/* FIFO is not enabled, there's nothing to clear. */
if (!(fcr & UART_FCR_ENABLE_FIFO))
return;
fcr |= clr_mask;
serial_out(p, UART_FCR, fcr);
fcr &= ~clr_mask;
serial_out(p, UART_FCR, fcr);
serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO);
serial_out(p, UART_FCR, UART_FCR_ENABLE_FIFO |
UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT);
}serial_out(p, UART_FCR, 0);
} @@ -1467,7 +1448,7 @@ static void __do_stop_tx_rs485(struct uart_8250_port *p) * Enable previously disabled RX interrupts. */ if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) {
serial8250_clear_fifos(p);
serial8250_clear_and_reinit_fifos(p);
p->ier |= UART_IER_RLSI | UART_IER_RDI; serial_port_out(&p->port, UART_IER, p->ier);
Hi Marek,
On Sun, Dec 16, 2018 at 10:08:48PM +0100, Marek Vasut wrote:
I did suggest an alternative approach which would rename serial8250_clear_fifos() and split it into 2 variants - one that disables FIFOs & one that does not, then use the latter in __do_stop_tx_rs485():
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
However I have no access to the OMAP3 hardware that Marek's patch was attempting to fix & have heard nothing back with regards to him testing that approach, so here's a simple revert that fixes the Ingenic JZ4780.
I've marked for stable back to v4.10 presuming that this is how far the broken patch may be backported, given that this is where commit 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break") that it tried to fix was introduced.
OK, I tested this on AM335x / OMAP3 and the system is again broken, so that's a NAK.
To be clear - what did you test? This revert or the patch linked to above?
This revert would of course reintroduce your RS485 issue because it just undoes your change.
Either way, commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again") breaks systems that worked before it so at this late stage in the 4.20 cycle a revert would make sense to me. If that breaks RS85 on OMAP3 then my question would be how much can anyone really care if nobody noticed since v4.10? And why should that lead to you breaking the JZ4780 which has been discovered before a stable kernel release includes the breakage?
Thanks, Paul
On 12/16/2018 10:39 PM, Paul Burton wrote:
Hi Marek,
Hi,
On Sun, Dec 16, 2018 at 10:08:48PM +0100, Marek Vasut wrote:
I did suggest an alternative approach which would rename serial8250_clear_fifos() and split it into 2 variants - one that disables FIFOs & one that does not, then use the latter in __do_stop_tx_rs485():
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
However I have no access to the OMAP3 hardware that Marek's patch was attempting to fix & have heard nothing back with regards to him testing that approach, so here's a simple revert that fixes the Ingenic JZ4780.
I've marked for stable back to v4.10 presuming that this is how far the broken patch may be backported, given that this is where commit 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break") that it tried to fix was introduced.
OK, I tested this on AM335x / OMAP3 and the system is again broken, so that's a NAK.
To be clear - what did you test? This revert or the patch linked to above?
This revert would of course reintroduce your RS485 issue because it just undoes your change.
The revert. Which of the two patches do you need me to test.
Either way, commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again") breaks systems that worked before it so at this late stage in the 4.20 cycle a revert would make sense to me. If that breaks RS85 on OMAP3 then my question would be how much can anyone really care if nobody noticed since v4.10? And why should that lead to you breaking the JZ4780 which has been discovered before a stable kernel release includes the breakage?
There's always a .y release where this can be properly investigated and solved, instead of breaking one platform or the other.
Then again, see the patch from Ezequiel that was just posted, I think it might be a far better solution.
Hi Marek,
On Sun, Dec 16, 2018 at 11:01:18PM +0100, Marek Vasut wrote:
I did suggest an alternative approach which would rename serial8250_clear_fifos() and split it into 2 variants - one that disables FIFOs & one that does not, then use the latter in __do_stop_tx_rs485():
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
However I have no access to the OMAP3 hardware that Marek's patch was attempting to fix & have heard nothing back with regards to him testing that approach, so here's a simple revert that fixes the Ingenic JZ4780.
I've marked for stable back to v4.10 presuming that this is how far the broken patch may be backported, given that this is where commit 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break") that it tried to fix was introduced.
OK, I tested this on AM335x / OMAP3 and the system is again broken, so that's a NAK.
To be clear - what did you test? This revert or the patch linked to above?
This revert would of course reintroduce your RS485 issue because it just undoes your change.
The revert. Which of the two patches do you need me to test.
The one in the email I sent on Thursday 13th at 01:48:06 UTC, linked to at lore.kernel.org in the quote right above:
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
You replied with comments on the patch, you just never tested it or never told me if you did. The lack of response means I don't know whether that potential patch even still works for your system, hence the revert.
Thanks, Paul
On 12/16/2018 11:28 PM, Paul Burton wrote:
Hi Marek,
On Sun, Dec 16, 2018 at 11:01:18PM +0100, Marek Vasut wrote:
I did suggest an alternative approach which would rename serial8250_clear_fifos() and split it into 2 variants - one that disables FIFOs & one that does not, then use the latter in __do_stop_tx_rs485():
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
However I have no access to the OMAP3 hardware that Marek's patch was attempting to fix & have heard nothing back with regards to him testing that approach, so here's a simple revert that fixes the Ingenic JZ4780.
I've marked for stable back to v4.10 presuming that this is how far the broken patch may be backported, given that this is where commit 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break") that it tried to fix was introduced.
OK, I tested this on AM335x / OMAP3 and the system is again broken, so that's a NAK.
To be clear - what did you test? This revert or the patch linked to above?
This revert would of course reintroduce your RS485 issue because it just undoes your change.
The revert. Which of the two patches do you need me to test.
The one in the email I sent on Thursday 13th at 01:48:06 UTC, linked to at lore.kernel.org in the quote right above:
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
You replied with comments on the patch, you just never tested it or never told me if you did. The lack of response means I don't know whether that potential patch even still works for your system, hence the revert.
I shared the entire testcase, which now fails on AM335x due to this revert. Is there any progress on a proper fix from your side which does not break the AM335x ?
Hi Marek,
On Fri, Dec 21, 2018 at 09:08:28PM +0100, Marek Vasut wrote:
On 12/16/2018 11:28 PM, Paul Burton wrote:
On Sun, Dec 16, 2018 at 11:01:18PM +0100, Marek Vasut wrote:
I did suggest an alternative approach which would rename serial8250_clear_fifos() and split it into 2 variants - one that disables FIFOs & one that does not, then use the latter in __do_stop_tx_rs485():
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
However I have no access to the OMAP3 hardware that Marek's patch was attempting to fix & have heard nothing back with regards to him testing that approach, so here's a simple revert that fixes the Ingenic JZ4780.
I've marked for stable back to v4.10 presuming that this is how far the broken patch may be backported, given that this is where commit 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break") that it tried to fix was introduced.
OK, I tested this on AM335x / OMAP3 and the system is again broken, so that's a NAK.
To be clear - what did you test? This revert or the patch linked to above?
This revert would of course reintroduce your RS485 issue because it just undoes your change.
The revert. Which of the two patches do you need me to test.
The one in the email I sent on Thursday 13th at 01:48:06 UTC, linked to at lore.kernel.org in the quote right above:
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
You replied with comments on the patch, you just never tested it or never told me if you did. The lack of response means I don't know whether that potential patch even still works for your system, hence the revert.
I shared the entire testcase, which now fails on AM335x due to this revert. Is there any progress on a proper fix from your side which does not break the AM335x ?
No.
Let's be clear - I didn't break your AM335x system, your broken patch says that Daniel did with a commit applied back in v4.10. As such I don't consider it my responsibility to fix your problem at all, I don't have any access to the hardware anyway & I won't be buying hardware to fix a bug for you.
Despite all that I did write a patch which I expect would have solved the problem for both of us, which is linked *twice* in the quoted emails above & which as far as I can tell you *still* have not tested. I can only surmise that you're trying deliberately to be annoying at this point.
If you want to try the patch I already wrote, and confirm whether it actually works for you, then let's talk. Otherwise we're done here.
Thanks, Paul
On 12/21/2018 10:08 PM, Paul Burton wrote:
Hi Marek,
On Fri, Dec 21, 2018 at 09:08:28PM +0100, Marek Vasut wrote:
On 12/16/2018 11:28 PM, Paul Burton wrote:
On Sun, Dec 16, 2018 at 11:01:18PM +0100, Marek Vasut wrote:
> I did suggest an alternative approach which would rename > serial8250_clear_fifos() and split it into 2 variants - one that > disables FIFOs & one that does not, then use the latter in > __do_stop_tx_rs485(): > > https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/ > > However I have no access to the OMAP3 hardware that Marek's patch was > attempting to fix & have heard nothing back with regards to him testing > that approach, so here's a simple revert that fixes the Ingenic JZ4780. > > I've marked for stable back to v4.10 presuming that this is how far the > broken patch may be backported, given that this is where commit > 2bed8a8e7072 ("Clearing FIFOs in RS485 emulation mode causes subsequent > transmits to break") that it tried to fix was introduced.
OK, I tested this on AM335x / OMAP3 and the system is again broken, so that's a NAK.
To be clear - what did you test? This revert or the patch linked to above?
This revert would of course reintroduce your RS485 issue because it just undoes your change.
The revert. Which of the two patches do you need me to test.
The one in the email I sent on Thursday 13th at 01:48:06 UTC, linked to at lore.kernel.org in the quote right above:
https://lore.kernel.org/lkml/20181213014805.77u5dzydo23cm6fq@pburton-laptop/
You replied with comments on the patch, you just never tested it or never told me if you did. The lack of response means I don't know whether that potential patch even still works for your system, hence the revert.
I shared the entire testcase, which now fails on AM335x due to this revert. Is there any progress on a proper fix from your side which does not break the AM335x ?
No.
Let's be clear - I didn't break your AM335x system, your broken patch says that Daniel did with a commit applied back in v4.10. As such I don't consider it my responsibility to fix your problem at all, I don't have any access to the hardware anyway & I won't be buying hardware to fix a bug for you.
Despite all that I did write a patch which I expect would have solved the problem for both of us, which is linked *twice* in the quoted emails above & which as far as I can tell you *still* have not tested. I can only surmise that you're trying deliberately to be annoying at this point.
If you want to try the patch I already wrote, and confirm whether it actually works for you, then let's talk. Otherwise we're done here.
Understood. However I did test your patch, but it was generating spurious IRQs and overruns (ttyS ttyS2: 91 input overrun(s)) on the AM335x, so that's not a proper solution.
I even brought the CI20 V2 I have back to life (yes, I bought one). The patch Ezequiel posted did fix the problem on the CI20, and did not break the AM335x, so I'd prefer if it was revisited instead of a heavy-handed revert.
And I'd prefer to keep this thread alive, since I am still convinced that the FIFO handling code is wrong. Moreover, considering the UME bit on JZ4780 in FCR, the original code should confuse the UART on the JZ4780 too, although this might be hidden by some other surrounding code specific to the 8250 core on the JZ4780.
I am also on mipslinux IRC channel, in case you want to discuss this.
Hi Marek,
On Fri, Dec 21, 2018 at 11:02:03PM +0100, Marek Vasut wrote:
I shared the entire testcase, which now fails on AM335x due to this revert. Is there any progress on a proper fix from your side which does not break the AM335x ?
No.
Let's be clear - I didn't break your AM335x system, your broken patch says that Daniel did with a commit applied back in v4.10. As such I don't consider it my responsibility to fix your problem at all, I don't have any access to the hardware anyway & I won't be buying hardware to fix a bug for you.
Despite all that I did write a patch which I expect would have solved the problem for both of us, which is linked *twice* in the quoted emails above & which as far as I can tell you *still* have not tested. I can only surmise that you're trying deliberately to be annoying at this point.
If you want to try the patch I already wrote, and confirm whether it actually works for you, then let's talk. Otherwise we're done here.
Understood. However I did test your patch, but it was generating spurious IRQs and overruns (ttyS ttyS2: 91 input overrun(s)) on the AM335x, so that's not a proper solution.
OK, thanks for testing, and let's figure out what's going on. Since the revert is in now I'd suggest starting from there - ie. approximately from the code we've had since v4.10.
I even brought the CI20 V2 I have back to life (yes, I bought one). The patch Ezequiel posted did fix the problem on the CI20, and did not break the AM335x, so I'd prefer if it was revisited instead of a heavy-handed revert.
As I described in an earlier email & on IRC, Ezequiel's one-liner does not address all of the problems listed in my revert's commit message. But again, since the revert is now in I suggest starting from there.
As neither a maintainer or significant contributor to drivers/tty/serial, nor the author of the patch applied in v4.10 which you say broke AM335x, nor someone with access to the affected hardware, I'm probably not the best placed person to help you here - all I can do is offer general debug suggestions. With that in mind, here are some questions & ideas I have:
1) Your message for commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again") suggests that the problem you're seeing is specific to the __do_stop_tx_rs485() path. Does changing only __do_stop_tx_rs485() to clear the FIFOs without disabling them, without modifying other users of serial8250_clear_fifos(), fix your problem? ie. does the following work?
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index f776b3eafb96..18d2a1a93f03 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -1458,17 +1458,21 @@ static void serial8250_stop_rx(struct uart_port *port) }
static void __do_stop_tx_rs485(struct uart_8250_port *p) { + unsigned char fcr; + serial8250_em485_rts_after_send(p);
/* * Empty the RX FIFO, we are not interested in anything * received during the half-duplex transmission. * Enable previously disabled RX interrupts. */ if (!(p->port.rs485.flags & SER_RS485_RX_DURING_TX)) { - serial8250_clear_fifos(p); + fcr = serial_port_in(&p->port, UART_FCR); + serial_port_out(&p->port, UART_FCR, + fcr | UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT);
p->ier |= UART_IER_RLSI | UART_IER_RDI; serial_port_out(&p->port, UART_IER, p->ier); }
Based on the comment above maybe leaving UART_FCR_CLEAR_XMIT out of that might make sense too..?
Having __do_stop_tx_rs485() not share the FIFO clearing code with other callers may make sense given that so far as I can see __do_stop_tx_rs485() runs whilst the port is in use & other callers of serial8250_clear_fifos() do not.
2) In an earlier email you said "The problem the original patch fixed was that too many bits were cleared in the FCR on OMAP3/AM335x". Could you clarify which bits we're talking about? From the AM335x reference manual I can only see the DMA_MODE bit & the [RT]X_FIFO_TRIG fields. Do you know which are problematic & why? In your commit message you mention the AM335x UART swallowing & retransmitting bytes - which field changing causes that?
And I'd prefer to keep this thread alive, since I am still convinced that the FIFO handling code is wrong. Moreover, considering the UME bit on JZ4780 in FCR, the original code should confuse the UART on the JZ4780 too, although this might be hidden by some other surrounding code specific to the 8250 core on the JZ4780.
For completeness, as I said in an earlier email in the thread and as we've since discussed on IRC, this is incorrect because ingenic_uart_serial_out() unconditionally sets the UME bit. As such the UME bit is not a problem, and JZ4780 works fine both before your patch & after the revert.
Thanks, Paul
linux-stable-mirror@lists.linaro.org