From: Hugo Villeneuve hvilleneuve@dimonoff.com
If an error occurs during probing, the sc16is7xx_lines bitfield may be left in a state that doesn't represent the correct state of lines allocation.
For example, in a system with two SC16 devices, if an error occurs only during probing of channel (port) B of the second device, sc16is7xx_lines final state will be 00001011b instead of the expected 00000011b.
This is caused in part because of the "i--" in the for/loop located in the out_ports: error path.
Fix this by checking the return value of uart_add_one_port() and set line allocation bit only if this was successful. This allows the refactor of the obfuscated for(i--...) loop in the error path, and properly call uart_remove_one_port() only when needed, and properly unset line allocation bits.
Also use same mechanism in remove() when calling uart_remove_one_port().
Fixes: c64349722d14 ("sc16is7xx: support multiple devices") Cc: stable@vger.kernel.org Cc: Yury Norov yury.norov@gmail.com Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com --- There is already a patch by Yury Norov yury.norov@gmail.com to simplify sc16is7xx_alloc_line(): https://lore.kernel.org/all/20231212022749.625238-30-yury.norov@gmail.com/
Since my patch gets rid of sc16is7xx_alloc_line() entirely, it would make Yury's patch unnecessary. --- drivers/tty/serial/sc16is7xx.c | 44 ++++++++++++++-------------------- 1 file changed, 18 insertions(+), 26 deletions(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index b585663c1e6e..b92fd01cfeec 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -407,19 +407,6 @@ static void sc16is7xx_port_update(struct uart_port *port, u8 reg, regmap_update_bits(one->regmap, reg, mask, val); }
-static int sc16is7xx_alloc_line(void) -{ - int i; - - BUILD_BUG_ON(SC16IS7XX_MAX_DEVS > BITS_PER_LONG); - - for (i = 0; i < SC16IS7XX_MAX_DEVS; i++) - if (!test_and_set_bit(i, &sc16is7xx_lines)) - break; - - return i; -} - static void sc16is7xx_power(struct uart_port *port, int on) { sc16is7xx_port_update(port, SC16IS7XX_IER_REG, @@ -1550,6 +1537,13 @@ static int sc16is7xx_probe(struct device *dev, SC16IS7XX_IOCONTROL_SRESET_BIT);
for (i = 0; i < devtype->nr_uart; ++i) { + s->p[i].port.line = find_first_zero_bit(&sc16is7xx_lines, + SC16IS7XX_MAX_DEVS); + if (s->p[i].port.line >= SC16IS7XX_MAX_DEVS) { + ret = -ERANGE; + goto out_ports; + } + /* Initialize port data */ s->p[i].port.dev = dev; s->p[i].port.irq = irq; @@ -1569,14 +1563,8 @@ static int sc16is7xx_probe(struct device *dev, s->p[i].port.rs485_supported = sc16is7xx_rs485_supported; s->p[i].port.ops = &sc16is7xx_ops; s->p[i].old_mctrl = 0; - s->p[i].port.line = sc16is7xx_alloc_line(); s->p[i].regmap = regmaps[i];
- if (s->p[i].port.line >= SC16IS7XX_MAX_DEVS) { - ret = -ENOMEM; - goto out_ports; - } - mutex_init(&s->p[i].efr_lock);
ret = uart_get_rs485_mode(&s->p[i].port); @@ -1594,8 +1582,13 @@ static int sc16is7xx_probe(struct device *dev, kthread_init_work(&s->p[i].tx_work, sc16is7xx_tx_proc); kthread_init_work(&s->p[i].reg_work, sc16is7xx_reg_proc); kthread_init_delayed_work(&s->p[i].ms_work, sc16is7xx_ms_proc); + /* Register port */ - uart_add_one_port(&sc16is7xx_uart, &s->p[i].port); + ret = uart_add_one_port(&sc16is7xx_uart, &s->p[i].port); + if (ret) + goto out_ports; + + set_bit(s->p[i].port.line, &sc16is7xx_lines);
/* Enable EFR */ sc16is7xx_port_write(&s->p[i].port, SC16IS7XX_LCR_REG, @@ -1653,10 +1646,9 @@ static int sc16is7xx_probe(struct device *dev, #endif
out_ports: - for (i--; i >= 0; i--) { - uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port); - clear_bit(s->p[i].port.line, &sc16is7xx_lines); - } + for (i = 0; i < devtype->nr_uart; i++) + if (test_and_clear_bit(s->p[i].port.line, &sc16is7xx_lines)) + uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port);
kthread_stop(s->kworker_task);
@@ -1683,8 +1675,8 @@ static void sc16is7xx_remove(struct device *dev)
for (i = 0; i < s->devtype->nr_uart; i++) { kthread_cancel_delayed_work_sync(&s->p[i].ms_work); - uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port); - clear_bit(s->p[i].port.line, &sc16is7xx_lines); + if (test_and_clear_bit(s->p[i].port.line, &sc16is7xx_lines)) + uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port); sc16is7xx_power(&s->p[i].port, 0); }
On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
From: Hugo Villeneuve hvilleneuve@dimonoff.com
If an error occurs during probing, the sc16is7xx_lines bitfield may be left in a state that doesn't represent the correct state of lines allocation.
For example, in a system with two SC16 devices, if an error occurs only during probing of channel (port) B of the second device, sc16is7xx_lines final state will be 00001011b instead of the expected 00000011b.
This is caused in part because of the "i--" in the for/loop located in the out_ports: error path.
Fix this by checking the return value of uart_add_one_port() and set line allocation bit only if this was successful. This allows the refactor of the obfuscated for(i--...) loop in the error path, and properly call uart_remove_one_port() only when needed, and properly unset line allocation bits.
Also use same mechanism in remove() when calling uart_remove_one_port().
Yes, this seems to be the correct one to fix the problem described in the patch 1. I dunno why the patch 1 even exists.
As for Yury's patch, you are doing fixes, so your stuff has priority on his.
On Wed, 20 Dec 2023 17:40:42 +0200 Andy Shevchenko andriy.shevchenko@intel.com wrote:
On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
From: Hugo Villeneuve hvilleneuve@dimonoff.com
If an error occurs during probing, the sc16is7xx_lines bitfield may be left in a state that doesn't represent the correct state of lines allocation.
For example, in a system with two SC16 devices, if an error occurs only during probing of channel (port) B of the second device, sc16is7xx_lines final state will be 00001011b instead of the expected 00000011b.
This is caused in part because of the "i--" in the for/loop located in the out_ports: error path.
Fix this by checking the return value of uart_add_one_port() and set line allocation bit only if this was successful. This allows the refactor of the obfuscated for(i--...) loop in the error path, and properly call uart_remove_one_port() only when needed, and properly unset line allocation bits.
Also use same mechanism in remove() when calling uart_remove_one_port().
Yes, this seems to be the correct one to fix the problem described in the patch 1. I dunno why the patch 1 even exists.
Hi, this will indeed fix the problem described in patch 1.
However, if I remove patch 1, and I simulate the same probe error as described in patch 1, now we get stuck forever when trying to remove the driver. This is something that I observed before and that patch 1 also corrected.
The problem is caused in sc16is7xx_remove() when calling this function
kthread_flush_worker(&s->kworker);
I am not sure how best to handle that without patch 1.
Hugo Villeneuve
On Thu, Dec 21, 2023 at 10:56:39AM -0500, Hugo Villeneuve wrote:
On Wed, 20 Dec 2023 17:40:42 +0200 Andy Shevchenko andriy.shevchenko@intel.com wrote:
On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
...
Yes, this seems to be the correct one to fix the problem described in the patch 1. I dunno why the patch 1 even exists.
Hi, this will indeed fix the problem described in patch 1.
However, if I remove patch 1, and I simulate the same probe error as described in patch 1, now we get stuck forever when trying to remove the driver. This is something that I observed before and that patch 1 also corrected.
The problem is caused in sc16is7xx_remove() when calling this function
kthread_flush_worker(&s->kworker);
I am not sure how best to handle that without patch 1.
So, it means we need to root cause this issue. Because patch 1 looks really bogus.
On Thu, 21 Dec 2023 10:56:39 -0500 Hugo Villeneuve hugo@hugovil.com wrote:
On Wed, 20 Dec 2023 17:40:42 +0200 Andy Shevchenko andriy.shevchenko@intel.com wrote:
On Tue, Dec 19, 2023 at 12:18:46PM -0500, Hugo Villeneuve wrote:
From: Hugo Villeneuve hvilleneuve@dimonoff.com
If an error occurs during probing, the sc16is7xx_lines bitfield may be left in a state that doesn't represent the correct state of lines allocation.
For example, in a system with two SC16 devices, if an error occurs only during probing of channel (port) B of the second device, sc16is7xx_lines final state will be 00001011b instead of the expected 00000011b.
This is caused in part because of the "i--" in the for/loop located in the out_ports: error path.
Fix this by checking the return value of uart_add_one_port() and set line allocation bit only if this was successful. This allows the refactor of the obfuscated for(i--...) loop in the error path, and properly call uart_remove_one_port() only when needed, and properly unset line allocation bits.
Also use same mechanism in remove() when calling uart_remove_one_port().
Yes, this seems to be the correct one to fix the problem described in the patch 1. I dunno why the patch 1 even exists.
Hi, this will indeed fix the problem described in patch 1.
However, if I remove patch 1, and I simulate the same probe error as described in patch 1, now we get stuck forever when trying to remove the driver. This is something that I observed before and that patch 1 also corrected.
The problem is caused in sc16is7xx_remove() when calling this function
kthread_flush_worker(&s->kworker);
I am not sure how best to handle that without patch 1.
Also, if we manage to get past kthread_flush_worker() and kthread_stop() (commented out for testing purposes), we get another bug:
# rmmod sc16is7xx ... crystal-duart-24m already disabled WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090 clk_core_disable+0x1b0/0x1e0 ... Call trace: clk_core_disable+0x1b0/0x1e0 clk_disable+0x38/0x60 sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]
This one is caused by calling clk_disable_unprepare(). But clk_disable_unprepare() has already been called in probe error handling code. Patch 1 also fixed this...
Hugo Villeneuve
On Thu, Dec 21, 2023 at 11:13:37AM -0500, Hugo Villeneuve wrote:
On Thu, 21 Dec 2023 10:56:39 -0500 Hugo Villeneuve hugo@hugovil.com wrote:
On Wed, 20 Dec 2023 17:40:42 +0200 Andy Shevchenko andriy.shevchenko@intel.com wrote:
...
this will indeed fix the problem described in patch 1.
However, if I remove patch 1, and I simulate the same probe error as described in patch 1, now we get stuck forever when trying to remove the driver. This is something that I observed before and that patch 1 also corrected.
The problem is caused in sc16is7xx_remove() when calling this function
kthread_flush_worker(&s->kworker);
I am not sure how best to handle that without patch 1.
Also, if we manage to get past kthread_flush_worker() and kthread_stop() (commented out for testing purposes), we get another bug:
# rmmod sc16is7xx ... crystal-duart-24m already disabled WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090 clk_core_disable+0x1b0/0x1e0 ... Call trace: clk_core_disable+0x1b0/0x1e0 clk_disable+0x38/0x60 sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]
This one is caused by calling clk_disable_unprepare(). But clk_disable_unprepare() has already been called in probe error handling code. Patch 1 also fixed this...
Word "fixed" is incorrect. "Papered over" is what it did.
On Thu, 21 Dec 2023 18:16:40 +0200 Andy Shevchenko andriy.shevchenko@intel.com wrote:
On Thu, Dec 21, 2023 at 11:13:37AM -0500, Hugo Villeneuve wrote:
On Thu, 21 Dec 2023 10:56:39 -0500 Hugo Villeneuve hugo@hugovil.com wrote:
On Wed, 20 Dec 2023 17:40:42 +0200 Andy Shevchenko andriy.shevchenko@intel.com wrote:
...
this will indeed fix the problem described in patch 1.
However, if I remove patch 1, and I simulate the same probe error as described in patch 1, now we get stuck forever when trying to remove the driver. This is something that I observed before and that patch 1 also corrected.
The problem is caused in sc16is7xx_remove() when calling this function
kthread_flush_worker(&s->kworker);
I am not sure how best to handle that without patch 1.
Also, if we manage to get past kthread_flush_worker() and kthread_stop() (commented out for testing purposes), we get another bug:
# rmmod sc16is7xx ... crystal-duart-24m already disabled WARNING: CPU: 2 PID: 340 at drivers/clk/clk.c:1090 clk_core_disable+0x1b0/0x1e0 ... Call trace: clk_core_disable+0x1b0/0x1e0 clk_disable+0x38/0x60 sc16is7xx_remove+0x1e4/0x240 [sc16is7xx]
This one is caused by calling clk_disable_unprepare(). But clk_disable_unprepare() has already been called in probe error handling code. Patch 1 also fixed this...
Word "fixed" is incorrect. "Papered over" is what it did.
Hi, I just found the problem, and it was in my bug simulation, not the driver itself. When I simulated the bug, I forgot to set "ret" to an error code, and thus I returned 0 at the end of sc16is7xx_probe(). This is why sc16is7xx_remove() was called when unloading driver, but shouldn't have.
If I simulate my probe error and return "-EINVAL" at the end of sc16is7xx_probe(), sc16is7xx_remove() is not called when unloading the driver.
Sorry for the noise, so I will drop patch 1 and leave patch "fix invalid sc16is7xx_lines bitfield in case of probe error" as it is, and simply remove comments about Yury's patch.
Hugo.
linux-stable-mirror@lists.linaro.org