If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind
Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0) (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c) (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8) (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc) (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20) (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298) (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60) (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac) (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
The resource-managed framework should not be used for interrupt handling, because the resource will be released too late - after disabling clocks. The interrupt handler is not prepared for such case.
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
---
This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the issues consistently.
[1] https://lore.kernel.org/lkml/1592130544-19759-2-git-send-email-krzk@kernel.o... --- drivers/spi/spi-fsl-dspi.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 58190c94561f..57e7a626ba00 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1385,8 +1385,8 @@ static int dspi_probe(struct platform_device *pdev) goto poll_mode; }
- ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt, - IRQF_SHARED, pdev->name, dspi); + ret = request_threaded_irq(dspi->irq, dspi_interrupt, NULL, + IRQF_SHARED, pdev->name, dspi); if (ret < 0) { dev_err(&pdev->dev, "Unable to attach DSPI interrupt\n"); goto out_clk_put; @@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev) ret = dspi_request_dma(dspi, res->start); if (ret < 0) { dev_err(&pdev->dev, "can't get dma channels\n"); - goto out_clk_put; + goto out_free_irq; } }
@@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev) ret = spi_register_controller(ctlr); if (ret != 0) { dev_err(&pdev->dev, "Problem registering DSPI ctlr\n"); - goto out_clk_put; + goto out_free_irq; }
return ret;
+out_free_irq: + if (dspi->irq > 0) + free_irq(dspi->irq, dspi); out_clk_put: clk_disable_unprepare(dspi->clk); out_ctlr_put: @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)
/* Disconnect from the SPI framework */ dspi_release_dma(dspi); + if (dspi->irq > 0) + free_irq(dspi->irq, dspi); clk_disable_unprepare(dspi->clk); spi_unregister_controller(dspi->ctlr);
If interrupt fires early, the dspi_interrupt() could complete (dspi->xfer_done) before its initialization happens.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org --- drivers/spi/spi-fsl-dspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 57e7a626ba00..efb63ed9fd86 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1385,6 +1385,8 @@ static int dspi_probe(struct platform_device *pdev) goto poll_mode; }
+ init_completion(&dspi->xfer_done); + ret = request_threaded_irq(dspi->irq, dspi_interrupt, NULL, IRQF_SHARED, pdev->name, dspi); if (ret < 0) { @@ -1392,8 +1394,6 @@ static int dspi_probe(struct platform_device *pdev) goto out_clk_put; }
- init_completion(&dspi->xfer_done); - poll_mode:
if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
On Sun, 14 Jun 2020 at 13:56, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt fires early, the dspi_interrupt() could complete (dspi->xfer_done) before its initialization happens.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
Why would an interrupt fire before spi_register_controller, therefore before dspi_transfer_one_message could get called? Is this master or slave mode?
drivers/spi/spi-fsl-dspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 57e7a626ba00..efb63ed9fd86 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1385,6 +1385,8 @@ static int dspi_probe(struct platform_device *pdev) goto poll_mode; }
init_completion(&dspi->xfer_done);
ret = request_threaded_irq(dspi->irq, dspi_interrupt, NULL, IRQF_SHARED, pdev->name, dspi); if (ret < 0) {
@@ -1392,8 +1394,6 @@ static int dspi_probe(struct platform_device *pdev) goto out_clk_put; }
init_completion(&dspi->xfer_done);
poll_mode:
if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
-- 2.7.4
On Sun, Jun 14, 2020 at 02:14:15PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 13:56, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt fires early, the dspi_interrupt() could complete (dspi->xfer_done) before its initialization happens.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
Why would an interrupt fire before spi_register_controller, therefore before dspi_transfer_one_message could get called? Is this master or slave mode?
I guess practically it won't fire. It's more of a matter of logical order and: 1. Someone might fix the CONFIG_DEBUG_SHIRQ_FIXME one day, 2. The hardware is actually initialized before and someone could attach to SPI bus some weird device.
Best regards, Krzysztof
On Sun, 14 Jun 2020 at 14:18, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 02:14:15PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 13:56, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt fires early, the dspi_interrupt() could complete (dspi->xfer_done) before its initialization happens.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
Why would an interrupt fire before spi_register_controller, therefore before dspi_transfer_one_message could get called? Is this master or slave mode?
I guess practically it won't fire. It's more of a matter of logical order and:
- Someone might fix the CONFIG_DEBUG_SHIRQ_FIXME one day,
And what if CONFIG_DEBUG_SHIRQ_FIXME gets fixed? I uncommented it, and still no issues. dspi_interrupt checks the status bit of the hw, sees there's nothing to do, and returns IRQ_NONE.
- The hardware is actually initialized before and someone could attach to SPI bus some weird device.
Some weird device that does what?
Best regards, Krzysztof
Thanks, -Vladimir
On Sun, 14 Jun 2020 at 16:39, Vladimir Oltean olteanv@gmail.com wrote:
On Sun, 14 Jun 2020 at 14:18, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 02:14:15PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 13:56, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt fires early, the dspi_interrupt() could complete (dspi->xfer_done) before its initialization happens.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
Also please note that this patch merely replaced an init_waitqueue_head with init_completion. But the "bug" (if we can call it that) originates from even before.
Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
Why would an interrupt fire before spi_register_controller, therefore before dspi_transfer_one_message could get called? Is this master or slave mode?
I guess practically it won't fire. It's more of a matter of logical order and:
- Someone might fix the CONFIG_DEBUG_SHIRQ_FIXME one day,
And what if CONFIG_DEBUG_SHIRQ_FIXME gets fixed? I uncommented it, and still no issues. dspi_interrupt checks the status bit of the hw, sees there's nothing to do, and returns IRQ_NONE.
- The hardware is actually initialized before and someone could attach to SPI bus some weird device.
Some weird device that does what?
Best regards, Krzysztof
Thanks, -Vladimir
On Sun, Jun 14, 2020 at 04:43:28PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 16:39, Vladimir Oltean olteanv@gmail.com wrote:
On Sun, 14 Jun 2020 at 14:18, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 02:14:15PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 13:56, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt fires early, the dspi_interrupt() could complete (dspi->xfer_done) before its initialization happens.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
Also please note that this patch merely replaced an init_waitqueue_head with init_completion. But the "bug" (if we can call it that) originates from even before.
Yeah, I know, the Fixes is not accurate. Backport to earlier kernels would be manual so I am not sure if accurate Fixes matter.
Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
Why would an interrupt fire before spi_register_controller, therefore before dspi_transfer_one_message could get called? Is this master or slave mode?
I guess practically it won't fire. It's more of a matter of logical order and:
- Someone might fix the CONFIG_DEBUG_SHIRQ_FIXME one day,
And what if CONFIG_DEBUG_SHIRQ_FIXME gets fixed? I uncommented it, and still no issues. dspi_interrupt checks the status bit of the hw, sees there's nothing to do, and returns IRQ_NONE.
Indeed, still the logical way of initializing is to do it before any possible use.
- The hardware is actually initialized before and someone could attach to SPI bus some weird device.
Some weird device that does what?
You never know what people will connect to a SoM :).
Wolfram made actually much better point - bootloaders are known to initialize some things and leaving them in whatever state, assuming that Linux kernel will redo any initialization properly.
Best regards, Krzysztof
On Sun, 14 Jun 2020 at 18:12, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 04:43:28PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 16:39, Vladimir Oltean olteanv@gmail.com wrote:
On Sun, 14 Jun 2020 at 14:18, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 02:14:15PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 13:56, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt fires early, the dspi_interrupt() could complete (dspi->xfer_done) before its initialization happens.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
Also please note that this patch merely replaced an init_waitqueue_head with init_completion. But the "bug" (if we can call it that) originates from even before.
Yeah, I know, the Fixes is not accurate. Backport to earlier kernels would be manual so I am not sure if accurate Fixes matter.
Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
Why would an interrupt fire before spi_register_controller, therefore before dspi_transfer_one_message could get called? Is this master or slave mode?
I guess practically it won't fire. It's more of a matter of logical order and:
- Someone might fix the CONFIG_DEBUG_SHIRQ_FIXME one day,
And what if CONFIG_DEBUG_SHIRQ_FIXME gets fixed? I uncommented it, and still no issues. dspi_interrupt checks the status bit of the hw, sees there's nothing to do, and returns IRQ_NONE.
Indeed, still the logical way of initializing is to do it before any possible use.
- The hardware is actually initialized before and someone could attach to SPI bus some weird device.
Some weird device that does what?
You never know what people will connect to a SoM :).
Wolfram made actually much better point - bootloaders are known to initialize some things and leaving them in whatever state, assuming that Linux kernel will redo any initialization properly.
Best regards, Krzysztof
I don't buy the argument. So ok, maybe some broken bootloader leaves a SPI_SR interrupt pending (do you have any example of that?). But the driver clears interrupts by writing SPI_SR_CLEAR in dspi_init (called _before_ requesting the IRQ). It clears 10 bits from the status register. There are 2 points to be made here: - The dspi_interrupt only handles data availability interrupt (SPI_SR_EOQF | SPI_SR_CMDTCF). Only then does it matter whether the completion was already initialized or not. But these interrupts _are_ cleared. But assume they weren't. What would Linux even do with a SPI transfer initiated by the previously running software environment? Why would it be a smart thing to handle that data in the first place? - The 10 bits from the status register are all the bits that can be cleared. The rest of the register, if you look at it, contains the TX FIFO Counter, the Transmit Next Pointer, the RX FIFO Counter, and the Pop Next Pointer. So, unless there's something I'm missing, I don't actually see how this broken bootloader can do any harm to us.
Thanks, -Vladimir
On Sun, Jun 14, 2020 at 06:34:33PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 18:12, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 04:43:28PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 16:39, Vladimir Oltean olteanv@gmail.com wrote:
On Sun, 14 Jun 2020 at 14:18, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 02:14:15PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 13:56, Krzysztof Kozlowski krzk@kernel.org wrote: > > If interrupt fires early, the dspi_interrupt() could complete > (dspi->xfer_done) before its initialization happens. > > Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
Also please note that this patch merely replaced an init_waitqueue_head with init_completion. But the "bug" (if we can call it that) originates from even before.
Yeah, I know, the Fixes is not accurate. Backport to earlier kernels would be manual so I am not sure if accurate Fixes matter.
> Cc: stable@vger.kernel.org > Signed-off-by: Krzysztof Kozlowski krzk@kernel.org > ---
Why would an interrupt fire before spi_register_controller, therefore before dspi_transfer_one_message could get called? Is this master or slave mode?
I guess practically it won't fire. It's more of a matter of logical order and:
- Someone might fix the CONFIG_DEBUG_SHIRQ_FIXME one day,
And what if CONFIG_DEBUG_SHIRQ_FIXME gets fixed? I uncommented it, and still no issues. dspi_interrupt checks the status bit of the hw, sees there's nothing to do, and returns IRQ_NONE.
Indeed, still the logical way of initializing is to do it before any possible use.
- The hardware is actually initialized before and someone could attach to SPI bus some weird device.
Some weird device that does what?
You never know what people will connect to a SoM :).
Wolfram made actually much better point - bootloaders are known to initialize some things and leaving them in whatever state, assuming that Linux kernel will redo any initialization properly.
Best regards, Krzysztof
I don't buy the argument. So ok, maybe some broken bootloader leaves a SPI_SR interrupt pending (do you have any example of that?). But the driver clears interrupts by writing SPI_SR_CLEAR in dspi_init (called _before_ requesting the IRQ). It clears 10 bits from the status register. There are 2 points to be made here:
- The dspi_interrupt only handles data availability interrupt
(SPI_SR_EOQF | SPI_SR_CMDTCF). Only then does it matter whether the completion was already initialized or not. But these interrupts _are_ cleared. But assume they weren't. What would Linux even do with a SPI transfer initiated by the previously running software environment? Why would it be a smart thing to handle that data in the first place?
- The 10 bits from the status register are all the bits that can be
cleared. The rest of the register, if you look at it, contains the TX FIFO Counter, the Transmit Next Pointer, the RX FIFO Counter, and the Pop Next Pointer. So, unless there's something I'm missing, I don't actually see how this broken bootloader can do any harm to us.
Let's rephrase it: you think therefore that completion should be initialzed *after* requesting shared interrupts? You think that exactly that order shall be used in the source code?
Best regards, Krzysztof
On Mon, 15 Jun 2020 at 10:09, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 06:34:33PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 18:12, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 04:43:28PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 16:39, Vladimir Oltean olteanv@gmail.com wrote:
On Sun, 14 Jun 2020 at 14:18, Krzysztof Kozlowski krzk@kernel.org wrote:
On Sun, Jun 14, 2020 at 02:14:15PM +0300, Vladimir Oltean wrote: > On Sun, 14 Jun 2020 at 13:56, Krzysztof Kozlowski krzk@kernel.org wrote: > > > > If interrupt fires early, the dspi_interrupt() could complete > > (dspi->xfer_done) before its initialization happens. > > > > Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion")
Also please note that this patch merely replaced an init_waitqueue_head with init_completion. But the "bug" (if we can call it that) originates from even before.
Yeah, I know, the Fixes is not accurate. Backport to earlier kernels would be manual so I am not sure if accurate Fixes matter.
> > Cc: stable@vger.kernel.org > > Signed-off-by: Krzysztof Kozlowski krzk@kernel.org > > --- > > Why would an interrupt fire before spi_register_controller, therefore > before dspi_transfer_one_message could get called? > Is this master or slave mode?
I guess practically it won't fire. It's more of a matter of logical order and:
- Someone might fix the CONFIG_DEBUG_SHIRQ_FIXME one day,
And what if CONFIG_DEBUG_SHIRQ_FIXME gets fixed? I uncommented it, and still no issues. dspi_interrupt checks the status bit of the hw, sees there's nothing to do, and returns IRQ_NONE.
Indeed, still the logical way of initializing is to do it before any possible use.
- The hardware is actually initialized before and someone could attach to SPI bus some weird device.
Some weird device that does what?
You never know what people will connect to a SoM :).
Wolfram made actually much better point - bootloaders are known to initialize some things and leaving them in whatever state, assuming that Linux kernel will redo any initialization properly.
Best regards, Krzysztof
I don't buy the argument. So ok, maybe some broken bootloader leaves a SPI_SR interrupt pending (do you have any example of that?). But the driver clears interrupts by writing SPI_SR_CLEAR in dspi_init (called _before_ requesting the IRQ). It clears 10 bits from the status register. There are 2 points to be made here:
- The dspi_interrupt only handles data availability interrupt
(SPI_SR_EOQF | SPI_SR_CMDTCF). Only then does it matter whether the completion was already initialized or not. But these interrupts _are_ cleared. But assume they weren't. What would Linux even do with a SPI transfer initiated by the previously running software environment? Why would it be a smart thing to handle that data in the first place?
- The 10 bits from the status register are all the bits that can be
cleared. The rest of the register, if you look at it, contains the TX FIFO Counter, the Transmit Next Pointer, the RX FIFO Counter, and the Pop Next Pointer. So, unless there's something I'm missing, I don't actually see how this broken bootloader can do any harm to us.
Let's rephrase it: you think therefore that completion should be initialzed *after* requesting shared interrupts? You think that exactly that order shall be used in the source code?
Best regards, Krzysztof
I think that completion should be initialized before it is used, just like any other variable. So far you have not proven any code path through which it can be used uninitialized, therefore I don't see why this should be accepted as a bug fix. Cleanup, cosmetic refactoring, design patterns, whatever, sure.
Thanks, -Vladimir
On Mon, Jun 15, 2020 at 12:26:37PM +0300, Vladimir Oltean wrote:
Let's rephrase it: you think therefore that completion should be
initialzed *after* requesting shared interrupts? You think that exactly that order shall be used in the source code?
Best regards, Krzysztof
I think that completion should be initialized before it is used, just like any other variable. So far you have not proven any code path through which it can be used uninitialized, therefore I don't see why this should be accepted as a bug fix. Cleanup, cosmetic refactoring, design patterns, whatever, sure.
Sure, let it call then cleanup, cosmetic refactoring.
Best regards, Krzysztof
If interrupt fires early, the dspi_interrupt() could complete (dspi->xfer_done) before its initialization happens.
Fixes: 4f5ee75ea171 ("spi: spi-fsl-dspi: Replace interruptible wait queue with a simple completion") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
Why would an interrupt fire before spi_register_controller, therefore before dspi_transfer_one_message could get called?
I don't know this HW, but the generic answer usually is: Bootloader used SPI and didn't clean up properly.
On Sun, 14 Jun 2020 at 13:57, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0) (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c) (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8) (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc) (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20) (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298) (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60) (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac) (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
The resource-managed framework should not be used for interrupt handling, because the resource will be released too late - after disabling clocks. The interrupt handler is not prepared for such case.
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
I don't buy this argument that "the resource-managed framework should not be used for interrupt handling". What is it there for, then? Could you just call disable_irq before clk_disable_unprepare instead of this massive rework?
This is an follow up of my other patch for I2C IMX driver [1]. Let's fix the issues consistently.
[1] https://lore.kernel.org/lkml/1592130544-19759-2-git-send-email-krzk@kernel.o...
drivers/spi/spi-fsl-dspi.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index 58190c94561f..57e7a626ba00 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1385,8 +1385,8 @@ static int dspi_probe(struct platform_device *pdev) goto poll_mode; }
ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt,
IRQF_SHARED, pdev->name, dspi);
ret = request_threaded_irq(dspi->irq, dspi_interrupt, NULL,
IRQF_SHARED, pdev->name, dspi); if (ret < 0) { dev_err(&pdev->dev, "Unable to attach DSPI interrupt\n"); goto out_clk_put;
@@ -1400,7 +1400,7 @@ static int dspi_probe(struct platform_device *pdev) ret = dspi_request_dma(dspi, res->start); if (ret < 0) { dev_err(&pdev->dev, "can't get dma channels\n");
goto out_clk_put;
goto out_free_irq; } }
@@ -1415,11 +1415,14 @@ static int dspi_probe(struct platform_device *pdev) ret = spi_register_controller(ctlr); if (ret != 0) { dev_err(&pdev->dev, "Problem registering DSPI ctlr\n");
goto out_clk_put;
goto out_free_irq; } return ret;
+out_free_irq:
if (dspi->irq > 0)
free_irq(dspi->irq, dspi);
out_clk_put: clk_disable_unprepare(dspi->clk); out_ctlr_put: @@ -1435,6 +1438,8 @@ static int dspi_remove(struct platform_device *pdev)
/* Disconnect from the SPI framework */ dspi_release_dma(dspi);
if (dspi->irq > 0)
free_irq(dspi->irq, dspi); clk_disable_unprepare(dspi->clk); spi_unregister_controller(dspi->ctlr);
-- 2.7.4
Thanks, -Vladimir
On Sun, Jun 14, 2020 at 06:48:04PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 13:57, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0) (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c) (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8) (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc) (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20) (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298) (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60) (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac) (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
The resource-managed framework should not be used for interrupt handling, because the resource will be released too late - after disabling clocks. The interrupt handler is not prepared for such case.
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
I don't buy this argument that "the resource-managed framework should not be used for interrupt handling". What is it there for, then?
It was created long time ago for memory allocations and since then people ported to all other possibilities and used in drivers. Just because you can do something, does not necessarily mean that you should...
Could you just call disable_irq before clk_disable_unprepare instead of this massive rework?
This massive rework is 9 insertions and 4 deletions, indeed I made impressive, huge commit with significant impact. disable_irq() could work as well so if this is preferred, no problem from my side.
Best regards, Krzysztof
On Mon, Jun 15, 2020 at 09:15:40AM +0200, Krzysztof Kozlowski wrote:
On Sun, Jun 14, 2020 at 06:48:04PM +0300, Vladimir Oltean wrote:
On Sun, 14 Jun 2020 at 13:57, Krzysztof Kozlowski krzk@kernel.org wrote:
If interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM CPU: 0 PID: 136 Comm: sh Not tainted 5.7.0-next-20200610-00009-g5c913fa0f9c5-dirty #74 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) (regmap_mmio_read32le) from [<8061885c>] (regmap_mmio_read+0x48/0x68) (regmap_mmio_read) from [<8060e3b8>] (_regmap_bus_reg_read+0x24/0x28) (_regmap_bus_reg_read) from [<80611c50>] (_regmap_read+0x70/0x1c0) (_regmap_read) from [<80611dec>] (regmap_read+0x4c/0x6c) (regmap_read) from [<80678ca0>] (dspi_interrupt+0x3c/0xa8) (dspi_interrupt) from [<8017acec>] (free_irq+0x26c/0x3cc) (free_irq) from [<8017dcec>] (devm_irq_release+0x1c/0x20) (devm_irq_release) from [<805f98ec>] (release_nodes+0x1e4/0x298) (release_nodes) from [<805f9ac8>] (devres_release_all+0x40/0x60) (devres_release_all) from [<805f5134>] (device_release_driver_internal+0x108/0x1ac) (device_release_driver_internal) from [<805f521c>] (device_driver_detach+0x20/0x24)
The resource-managed framework should not be used for interrupt handling, because the resource will be released too late - after disabling clocks. The interrupt handler is not prepared for such case.
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzk@kernel.org
I don't buy this argument that "the resource-managed framework should not be used for interrupt handling". What is it there for, then?
It was created long time ago for memory allocations and since then people ported to all other possibilities and used in drivers. Just because you can do something, does not necessarily mean that you should...
Could you just call disable_irq before clk_disable_unprepare instead of this massive rework?
This massive rework is 9 insertions and 4 deletions, indeed I made impressive, huge commit with significant impact. disable_irq() could work as well so if this is preferred, no problem from my side.
disable_irq() should fix real world case but won't fix DEBUG_SHIRQ. I'll rework it as well but then we go to bigger change again.
Best regards, Krzysztof
linux-stable-mirror@lists.linaro.org