If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind
Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0) (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c) (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8) (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc) (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20) (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298) (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60) (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac) (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
---
This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the issues consistently.
[1] https://lore.kernel.org/lkml/1592130544-19759-2-git-send-email-krzk@kernel.o...
Changes since v1: 1. Disable the IRQ instead of using non-devm interface. --- drivers/spi/spi-fsl-dspi.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 58190c94561f..023e05c53b85 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev) ret = dspi_request_dma(dspi, res->start); if (ret < 0) { dev_err(&pdev->dev, "can't get dma channels\n"); - goto out_clk_put; + goto disable_irq; } }
@@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev) ret = spi_register_controller(ctlr); if (ret != 0) { dev_err(&pdev->dev, "Problem registering DSPI ctlr\n"); - goto out_clk_put; + goto disable_irq; }
return ret;
+disable_irq: + if (dspi->irq > 0) + disable_irq(dspi->irq); out_clk_put: clk_disable_unprepare(dspi->clk); out_ctlr_put: @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)
/* Disconnect from the SPI framework */ dspi_release_dma(dspi); + if (dspi->irq > 0) + disable_irq(dspi->irq); clk_disable_unprepare(dspi->clk); spi_unregister_controller(dspi->ctlr);
The interrupt handler calls completion and is IRQ requested before the completion is initialized. Logically it should be the other way.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
---
Changes since v1: 1. Rework the commit msg. --- drivers/spi/spi-fsl-dspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 023e05c53b85..080c5624bd1e 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1385,6 +1385,8 @@ static int dspi_probe(struct platform_device *pdev) goto poll_mode; }
+ init_completion(&dspi->xfer_done); + ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt, IRQF_SHARED, pdev->name, dspi); if (ret < 0) { @@ -1392,8 +1394,6 @@ static int dspi_probe(struct platform_device *pdev) goto out_clk_put; }
- init_completion(&dspi->xfer_done); - poll_mode:
if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
On 6/15/20 10:07 AM, Krzysztof Kozlowski wrote:
If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0) (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c) (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8) (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc) (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20) (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298) (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60) (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac) (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the issues consistently.
[1] https://lore.kernel.org/lkml/1592130544-19759-2-git-send-email-krzk@kernel.o...
Changes since v1:
- Disable the IRQ instead of using non-devm interface.
drivers/spi/spi-fsl-dspi.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 58190c94561f..023e05c53b85 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev) ret = dspi_request_dma(dspi, res->start); if (ret < 0) { dev_err(&pdev->dev, "can't get dma channels\n");
goto out_clk_put;
} }goto disable_irq;
@@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev) ret = spi_register_controller(ctlr); if (ret != 0) { dev_err(&pdev->dev, "Problem registering DSPI ctlr\n");
goto out_clk_put;
}goto disable_irq;
return ret; +disable_irq:
- if (dspi->irq > 0)
disable_irq(dspi->irq);
out_clk_put: clk_disable_unprepare(dspi->clk); out_ctlr_put: @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev) /* Disconnect from the SPI framework */ dspi_release_dma(dspi);
- if (dspi->irq > 0)
disable_irq(dspi->irq);
What happens, if you re-bind the driver? Is the IRQ still working? Who is taking care of calling the enable_irq() again? What happens, if you really have a shared IRQ line? Is the IRQ disabled for all other devices on the same IRQ line?
clk_disable_unprepare(dspi->clk); spi_unregister_controller(dspi->ctlr);
Marc
On Mon, 15 Jun 2020 at 11:18, Marc Kleine-Budde mkl@pengutronix.de wrote:
On 6/15/20 10:07 AM, Krzysztof Kozlowski wrote:
If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0) (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c) (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8) (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc) (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20) (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298) (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60) (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac) (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the issues consistently.
[1] https://lore.kernel.org/lkml/1592130544-19759-2-git-send-email-krzk@kernel.o...
Changes since v1:
- Disable the IRQ instead of using non-devm interface.
drivers/spi/spi-fsl-dspi.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 58190c94561f..023e05c53b85 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev) ret = dspi_request_dma(dspi, res->start); if (ret < 0) { dev_err(&pdev->dev, "can't get dma channels\n");
goto out_clk_put;
goto disable_irq; } }
@@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev) ret = spi_register_controller(ctlr); if (ret != 0) { dev_err(&pdev->dev, "Problem registering DSPI ctlr\n");
goto out_clk_put;
goto disable_irq; } return ret;
+disable_irq:
if (dspi->irq > 0)
disable_irq(dspi->irq);
out_clk_put: clk_disable_unprepare(dspi->clk); out_ctlr_put: @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)
/* Disconnect from the SPI framework */ dspi_release_dma(dspi);
if (dspi->irq > 0)
disable_irq(dspi->irq);
What happens, if you re-bind the driver? Is the IRQ still working? Who is taking care of calling the enable_irq() again? What happens, if you really have a shared IRQ line? Is the IRQ disabled for all other devices on the same IRQ line?
Yup, devm is completely broken for shared IRQs.
clk_disable_unprepare(dspi->clk); spi_unregister_controller(dspi->ctlr);
Marc
-- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On Mon, Jun 15, 2020 at 12:23:41PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 11:18, Marc Kleine-Budde mkl@pengutronix.de wrote:
On 6/15/20 10:07 AM, Krzysztof Kozlowski wrote:
If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0) (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c) (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8) (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc) (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20) (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298) (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60) (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac) (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the issues consistently.
[1] https://lore.kernel.org/lkml/1592130544-19759-2-git-send-email-krzk@kernel.o...
Changes since v1:
- Disable the IRQ instead of using non-devm interface.
drivers/spi/spi-fsl-dspi.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 58190c94561f..023e05c53b85 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev) ret = dspi_request_dma(dspi, res->start); if (ret < 0) { dev_err(&pdev->dev, "can't get dma channels\n");
goto out_clk_put;
goto disable_irq; } }
@@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev) ret = spi_register_controller(ctlr); if (ret != 0) { dev_err(&pdev->dev, "Problem registering DSPI ctlr\n");
goto out_clk_put;
goto disable_irq; } return ret;
+disable_irq:
if (dspi->irq > 0)
disable_irq(dspi->irq);
out_clk_put: clk_disable_unprepare(dspi->clk); out_ctlr_put: @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)
/* Disconnect from the SPI framework */ dspi_release_dma(dspi);
if (dspi->irq > 0)
disable_irq(dspi->irq);
What happens, if you re-bind the driver? Is the IRQ still working? Who is taking care of calling the enable_irq() again? What happens, if you really have a shared IRQ line? Is the IRQ disabled for all other devices on the same IRQ line?
Yup, devm is completely broken for shared IRQs.
Then we're back to this massive rework of using non-devm interface.
Best regards, Krzysztof
On Mon, Jun 15, 2020 at 10:17:07AM +0200, Marc Kleine-Budde wrote:
On 6/15/20 10:07 AM, Krzysztof Kozlowski wrote:
If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0)
Please think hard before including complete backtraces in upstream reports, they are very large and contain almost no useful information relative to their size so often obscure the relevant content in your message. If part of the backtrace is usefully illustrative (it often is for search engines if nothing else) then it's usually better to pull out the relevant sections.
+disable_irq:
- if (dspi->irq > 0)
disable_irq(dspi->irq);
What happens, if you re-bind the driver? Is the IRQ still working? Who is taking care of calling the enable_irq() again? What happens, if you really have a shared IRQ line? Is the IRQ disabled for all other devices on the same IRQ line?
Indeed. The upshot of all this is that the interrupt needs to be freed not disabled before the clocks are disabled, or some other mechanism needs to be used to ensure that the interrupt handler won't attempt to access the hardware when it shouldn't. As Vladimir says there are serious issues using devm for interrupt handlers (or anything else that might cause code to be run) due to problems like this.
On Mon, 15 Jun 2020 at 15:35, Mark Brown broonie@kernel.org wrote:
Indeed. The upshot of all this is that the interrupt needs to be freed not disabled before the clocks are disabled, or some other mechanism needs to be used to ensure that the interrupt handler won't attempt to access the hardware when it shouldn't. As Vladimir says there are serious issues using devm for interrupt handlers (or anything else that might cause code to be run) due to problems like this.
And the down-shot is that whatever is done in dspi_remove (free_irq) also needs to be done in dspi_suspend, but with extra care in dspi_resume not only to request the irq again, but also to flush the module's FIFOs and clear interrupts, because there might have been nasty stuff uncaught during sleep:
regmap_update_bits(dspi->regmap, SPI_MCR, SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF, SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF); regmap_write(dspi->regmap, SPI_SR, SPI_SR_CLEAR);
So it's pretty messy.
-Vladimir
On Mon, Jun 15, 2020 at 03:56:01PM +0300, Vladimir Oltean wrote:
And the down-shot is that whatever is done in dspi_remove (free_irq) also needs to be done in dspi_suspend, but with extra care in dspi_resume not only to request the irq again, but also to flush the module's FIFOs and clear interrupts, because there might have been nasty stuff uncaught during sleep:
regmap_update_bits(dspi->regmap, SPI_MCR, SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF, SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF); regmap_write(dspi->regmap, SPI_SR, SPI_SR_CLEAR);
So it's pretty messy.
It's a bit unusual to need to actually free the IRQ over suspend - what's driving that requirement here? I can see we might need to reinit the hardware but usually the interrupt handler can be left there safely.
On Mon, 15 Jun 2020 at 16:10, Mark Brown broonie@kernel.org wrote:
It's a bit unusual to need to actually free the IRQ over suspend - what's driving that requirement here?
clk_disable_unprepare(dspi->clk); is driving the requirement - same as in dspi_remove case, the module will fault when its registers are accessed without a clock.
-Vladimir
On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:10, Mark Brown broonie@kernel.org wrote:
It's a bit unusual to need to actually free the IRQ over suspend - what's driving that requirement here?
clk_disable_unprepare(dspi->clk); is driving the requirement - same as in dspi_remove case, the module will fault when its registers are accessed without a clock.
I see - this could be fixed by having the interrupt handler bounce the clock on, there's a little overhead from that but hopefully not too much. That should also help with the remove case I guess so long as the clock is registered before the interrupt is requested?
On Mon, 15 Jun 2020 at 16:24, Mark Brown broonie@kernel.org wrote:
On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:10, Mark Brown broonie@kernel.org wrote:
It's a bit unusual to need to actually free the IRQ over suspend - what's driving that requirement here?
clk_disable_unprepare(dspi->clk); is driving the requirement - same as in dspi_remove case, the module will fault when its registers are accessed without a clock.
I see - this could be fixed by having the interrupt handler bounce the clock on, there's a little overhead from that but hopefully not too much. That should also help with the remove case I guess so long as the clock is registered before the interrupt is requested?
Doesn't this mean that we risk leaving the clock enabled during suspend? Is there any function in the SPI core that quiesces any pending transactions, and then stops the controller? I would have expected spi_controller_suspend to do that, but I'm not sure (it doesn't look like it).
On Mon, Jun 15, 2020 at 04:29:15PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:24, Mark Brown broonie@kernel.org wrote:
I see - this could be fixed by having the interrupt handler bounce the clock on, there's a little overhead from that but hopefully not too much. That should also help with the remove case I guess so long as the clock is registered before the interrupt is requested?
Doesn't this mean that we risk leaving the clock enabled during suspend?
If we suspend with the interrupt handler running but IIRC the suspend sequence will allow interrupt handlers to complete.
Is there any function in the SPI core that quiesces any pending transactions, and then stops the controller? I would have expected spi_controller_suspend to do that, but I'm not sure (it doesn't look like it).
spi_stop_queue() should do this (but will time out if the queue is too busy). It doesn't stop new transactions being issued, I'm guessing because that'll most likely cause more problems than it solves but that code predates my involvement.
On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:10, Mark Brown broonie@kernel.org wrote:
It's a bit unusual to need to actually free the IRQ over suspend - what's driving that requirement here?
clk_disable_unprepare(dspi->clk); is driving the requirement - same as in dspi_remove case, the module will fault when its registers are accessed without a clock.
In few cases when I have shared interrupt in different drivers, they were just disabling it during suspend. Why it has to be freed?
Best regards, Krzysztof
On Mon, 15 Jun 2020 at 16:41, Krzysztof Kozlowski krzk@kernel.org wrote:
On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:10, Mark Brown broonie@kernel.org wrote:
It's a bit unusual to need to actually free the IRQ over suspend - what's driving that requirement here?
clk_disable_unprepare(dspi->clk); is driving the requirement - same as in dspi_remove case, the module will fault when its registers are accessed without a clock.
In few cases when I have shared interrupt in different drivers, they were just disabling it during suspend. Why it has to be freed?
Best regards, Krzysztof
Not saying it _has_ to be freed, just to be prevented from running concurrently with us disabling the clock. But if we can get away in dspi_suspend with just disable_irq, can't we also get away in dspi_remove with just devm_free_irq?
-Vladimir
On Mon, Jun 15, 2020 at 05:23:28PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:41, Krzysztof Kozlowski krzk@kernel.org wrote:
On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:10, Mark Brown broonie@kernel.org wrote:
It's a bit unusual to need to actually free the IRQ over suspend - what's driving that requirement here?
clk_disable_unprepare(dspi->clk); is driving the requirement - same as in dspi_remove case, the module will fault when its registers are accessed without a clock.
In few cases when I have shared interrupt in different drivers, they were just disabling it during suspend. Why it has to be freed?
Best regards, Krzysztof
Not saying it _has_ to be freed, just to be prevented from running concurrently with us disabling the clock. But if we can get away in dspi_suspend with just disable_irq, can't we also get away in dspi_remove with just devm_free_irq?
One reason why they have to be different could be following scenario: 1. Device could be unbound any time and disabling IRQ in remove() would effectively disable the IRQ also for other devices using this shared line. First disable_irq() really disables it, the latter just increases the counter. 2. However during system suspend, it is expected that all drivers in their suspend (and later resume) callbacks will do the same - disable the shared IRQ line. And finally the system disables interrupts globally so the line will be balanced.
Freeing IRQ solves the case #1 without causing any imbalance between enables/disables or requests/frees. Disabling IRQ solves the #2, also without any imbalance.
Best regards, Krzysztof
On Mon, 15 Jun 2020 at 17:57, Krzysztof Kozlowski krzk@kernel.org wrote:
On Mon, Jun 15, 2020 at 05:23:28PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:41, Krzysztof Kozlowski krzk@kernel.org wrote:
On Mon, Jun 15, 2020 at 04:12:28PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:10, Mark Brown broonie@kernel.org wrote:
It's a bit unusual to need to actually free the IRQ over suspend - what's driving that requirement here?
clk_disable_unprepare(dspi->clk); is driving the requirement - same as in dspi_remove case, the module will fault when its registers are accessed without a clock.
In few cases when I have shared interrupt in different drivers, they were just disabling it during suspend. Why it has to be freed?
Best regards, Krzysztof
Not saying it _has_ to be freed, just to be prevented from running concurrently with us disabling the clock. But if we can get away in dspi_suspend with just disable_irq, can't we also get away in dspi_remove with just devm_free_irq?
One reason why they have to be different could be following scenario:
- Device could be unbound any time and disabling IRQ in remove() would effectively disable the IRQ also for other devices using this shared line. First disable_irq() really disables it, the latter just increases the counter.
- However during system suspend, it is expected that all drivers in their suspend (and later resume) callbacks will do the same - disable the shared IRQ line. And finally the system disables interrupts globally so the line will be balanced.
Freeing IRQ solves the case #1 without causing any imbalance between enables/disables or requests/frees. Disabling IRQ solves the #2, also without any imbalance.
Best regards, Krzysztof
So the answer to my question is 'yes', right?
On Mon, Jun 15, 2020 at 03:56:01PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 15:35, Mark Brown broonie@kernel.org wrote:
Indeed. The upshot of all this is that the interrupt needs to be freed not disabled before the clocks are disabled, or some other mechanism needs to be used to ensure that the interrupt handler won't attempt to access the hardware when it shouldn't. As Vladimir says there are serious issues using devm for interrupt handlers (or anything else that might cause code to be run) due to problems like this.
And the down-shot is that whatever is done in dspi_remove (free_irq) also needs to be done in dspi_suspend, but with extra care in dspi_resume not only to request the irq again, but also to flush the module's FIFOs and clear interrupts, because there might have been nasty stuff uncaught during sleep:
regmap_update_bits(dspi->regmap, SPI_MCR, SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF, SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF); regmap_write(dspi->regmap, SPI_SR, SPI_SR_CLEAR);
So it's pretty messy.
It is a slightly different bug which so this patch should have a follow up.
Best regards, Krzysztof
On Mon, 15 Jun 2020 at 16:10, Krzysztof Kozlowski krzk@kernel.org wrote:
It is a slightly different bug which so this patch should have a follow up.
Best regards, Krzysztof
Why is it a different bug? It's the same bug.
On Mon, Jun 15, 2020 at 04:14:06PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:10, Krzysztof Kozlowski krzk@kernel.org wrote:
It is a slightly different bug which so this patch should have a follow up.
Best regards, Krzysztof
Why is it a different bug? It's the same bug.
One bug is using devm-interface for shared interrupts and second is not caring about suspend/resume.
Best regards, Krzysztof
On Mon, 15 Jun 2020 at 16:28, Krzysztof Kozlowski krzk@kernel.org wrote:
On Mon, Jun 15, 2020 at 04:14:06PM +0300, Vladimir Oltean wrote:
On Mon, 15 Jun 2020 at 16:10, Krzysztof Kozlowski krzk@kernel.org wrote:
It is a slightly different bug which so this patch should have a follow up.
Best regards, Krzysztof
Why is it a different bug? It's the same bug.
One bug is using devm-interface for shared interrupts and second is not caring about suspend/resume.
Best regards, Krzysztof
The problem is that you don't have a way to stop servicing a shared interrupt safely and on demand, before clk_disable_unprepare. So it's exactly the same problem on suspend and on remove. Avoiding to think about the suspend problem now means that you'll end up having an overall worse solution.
linux-stable-mirror@lists.linaro.org