The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9e528c799d17a4ac37d788c81440b50377dd592d Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Wed, 23 Jan 2019 09:26:00 +0100
Subject: [PATCH] dmaengine: bcm2835: Fix abort of transactions
There are multiple issues with bcm2835_dma_abort() (which is called on
termination of a transaction):
* The algorithm to abort the transaction first pauses the channel by
clearing the ACTIVE flag in the CS register, then waits for the PAUSED
flag to clear. Page 49 of the spec documents the latter as follows:
"Indicates if the DMA is currently paused and not transferring data.
This will occur if the active bit has been cleared [...]"
https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
So the function is entering an infinite loop because it is waiting for
PAUSED to clear which is always set due to the function having cleared
the ACTIVE flag. The only thing that's saving it from itself is the
upper bound of 10000 loop iterations.
The code comment says that the intention is to "wait for any current
AXI transfer to complete", so the author probably wanted to check the
WAITING_FOR_OUTSTANDING_WRITES flag instead. Amend the function
accordingly.
* The CS register is only read at the beginning of the function. It
needs to be read again after pausing the channel and before checking
for outstanding writes, otherwise writes which were issued between
the register read at the beginning of the function and pausing the
channel may not be waited for.
* The function seeks to abort the transfer by writing 0 to the NEXTCONBK
register and setting the ABORT and ACTIVE flags. Thereby, the 0 in
NEXTCONBK is sought to be loaded into the CONBLK_AD register. However
experimentation has shown this approach to not work: The CONBLK_AD
register remains the same as before and the CS register contains
0x00000030 (PAUSED | DREQ_STOPS_DMA). In other words, the control
block is not aborted but merely paused and it will be resumed once the
next DMA transaction is started. That is absolutely not the desired
behavior.
A simpler approach is to set the channel's RESET flag instead. This
reliably zeroes the NEXTCONBK as well as the CS register. It requires
less code and only a single MMIO write. This is also what popular
user space DMA drivers do, e.g.:
https://github.com/metachris/RPIO/blob/master/source/c_pwm/pwm.c
Note that the spec is contradictory whether the NEXTCONBK register
is writeable at all. On the one hand, page 41 claims:
"The value loaded into the NEXTCONBK register can be overwritten so
that the linked list of Control Block data structures can be
dynamically altered. However it is only safe to do this when the DMA
is paused."
On the other hand, page 40 specifies:
"Only three registers in each channel's register set are directly
writeable (CS, CONBLK_AD and DEBUG). The other registers (TI,
SOURCE_AD, DEST_AD, TXFR_LEN, STRIDE & NEXTCONBK), are automatically
loaded from a Control Block data structure held in external memory."
Fixes: 96286b576690 ("dmaengine: Add support for BCM2835")
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Cc: stable(a)vger.kernel.org # v3.14+
Cc: Frank Pavlic <f.pavlic(a)kunbus.de>
Cc: Martin Sperl <kernel(a)martin.sperl.org>
Cc: Florian Meier <florian.meier(a)koalo.de>
Cc: Clive Messer <clive.m.messer(a)gmail.com>
Cc: Matthias Reichl <hias(a)horus.com>
Tested-by: Stefan Wahren <stefan.wahren(a)i2se.com>
Acked-by: Florian Kauer <florian.kauer(a)koalo.de>
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 0c3f5c71bb48..ae10f5614f95 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -406,13 +406,11 @@ static void bcm2835_dma_fill_cb_chain_with_sg(
}
}
-static int bcm2835_dma_abort(void __iomem *chan_base)
+static int bcm2835_dma_abort(struct bcm2835_chan *c)
{
- unsigned long cs;
+ void __iomem *chan_base = c->chan_base;
long int timeout = 10000;
- cs = readl(chan_base + BCM2835_DMA_CS);
-
/*
* A zero control block address means the channel is idle.
* (The ACTIVE flag in the CS register is not a reliable indicator.)
@@ -424,25 +422,16 @@ static int bcm2835_dma_abort(void __iomem *chan_base)
writel(0, chan_base + BCM2835_DMA_CS);
/* Wait for any current AXI transfer to complete */
- while ((cs & BCM2835_DMA_ISPAUSED) && --timeout) {
+ while ((readl(chan_base + BCM2835_DMA_CS) &
+ BCM2835_DMA_WAITING_FOR_WRITES) && --timeout)
cpu_relax();
- cs = readl(chan_base + BCM2835_DMA_CS);
- }
- /* We'll un-pause when we set of our next DMA */
+ /* Peripheral might be stuck and fail to signal AXI write responses */
if (!timeout)
- return -ETIMEDOUT;
-
- if (!(cs & BCM2835_DMA_ACTIVE))
- return 0;
-
- /* Terminate the control block chain */
- writel(0, chan_base + BCM2835_DMA_NEXTCB);
-
- /* Abort the whole DMA */
- writel(BCM2835_DMA_ABORT | BCM2835_DMA_ACTIVE,
- chan_base + BCM2835_DMA_CS);
+ dev_err(c->vc.chan.device->dev,
+ "failed to complete outstanding writes\n");
+ writel(BCM2835_DMA_RESET, chan_base + BCM2835_DMA_CS);
return 0;
}
@@ -787,7 +776,6 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
struct bcm2835_dmadev *d = to_bcm2835_dma_dev(c->vc.chan.device);
unsigned long flags;
- int timeout = 10000;
LIST_HEAD(head);
spin_lock_irqsave(&c->vc.lock, flags);
@@ -801,18 +789,7 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
if (c->desc) {
vchan_terminate_vdesc(&c->desc->vd);
c->desc = NULL;
- bcm2835_dma_abort(c->chan_base);
-
- /* Wait for stopping */
- while (--timeout) {
- if (!readl(c->chan_base + BCM2835_DMA_ADDR))
- break;
-
- cpu_relax();
- }
-
- if (!timeout)
- dev_err(d->ddev.dev, "DMA transfer could not be terminated\n");
+ bcm2835_dma_abort(c);
}
vchan_get_all_descriptors(&c->vc, &head);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9e528c799d17a4ac37d788c81440b50377dd592d Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Wed, 23 Jan 2019 09:26:00 +0100
Subject: [PATCH] dmaengine: bcm2835: Fix abort of transactions
There are multiple issues with bcm2835_dma_abort() (which is called on
termination of a transaction):
* The algorithm to abort the transaction first pauses the channel by
clearing the ACTIVE flag in the CS register, then waits for the PAUSED
flag to clear. Page 49 of the spec documents the latter as follows:
"Indicates if the DMA is currently paused and not transferring data.
This will occur if the active bit has been cleared [...]"
https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
So the function is entering an infinite loop because it is waiting for
PAUSED to clear which is always set due to the function having cleared
the ACTIVE flag. The only thing that's saving it from itself is the
upper bound of 10000 loop iterations.
The code comment says that the intention is to "wait for any current
AXI transfer to complete", so the author probably wanted to check the
WAITING_FOR_OUTSTANDING_WRITES flag instead. Amend the function
accordingly.
* The CS register is only read at the beginning of the function. It
needs to be read again after pausing the channel and before checking
for outstanding writes, otherwise writes which were issued between
the register read at the beginning of the function and pausing the
channel may not be waited for.
* The function seeks to abort the transfer by writing 0 to the NEXTCONBK
register and setting the ABORT and ACTIVE flags. Thereby, the 0 in
NEXTCONBK is sought to be loaded into the CONBLK_AD register. However
experimentation has shown this approach to not work: The CONBLK_AD
register remains the same as before and the CS register contains
0x00000030 (PAUSED | DREQ_STOPS_DMA). In other words, the control
block is not aborted but merely paused and it will be resumed once the
next DMA transaction is started. That is absolutely not the desired
behavior.
A simpler approach is to set the channel's RESET flag instead. This
reliably zeroes the NEXTCONBK as well as the CS register. It requires
less code and only a single MMIO write. This is also what popular
user space DMA drivers do, e.g.:
https://github.com/metachris/RPIO/blob/master/source/c_pwm/pwm.c
Note that the spec is contradictory whether the NEXTCONBK register
is writeable at all. On the one hand, page 41 claims:
"The value loaded into the NEXTCONBK register can be overwritten so
that the linked list of Control Block data structures can be
dynamically altered. However it is only safe to do this when the DMA
is paused."
On the other hand, page 40 specifies:
"Only three registers in each channel's register set are directly
writeable (CS, CONBLK_AD and DEBUG). The other registers (TI,
SOURCE_AD, DEST_AD, TXFR_LEN, STRIDE & NEXTCONBK), are automatically
loaded from a Control Block data structure held in external memory."
Fixes: 96286b576690 ("dmaengine: Add support for BCM2835")
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Cc: stable(a)vger.kernel.org # v3.14+
Cc: Frank Pavlic <f.pavlic(a)kunbus.de>
Cc: Martin Sperl <kernel(a)martin.sperl.org>
Cc: Florian Meier <florian.meier(a)koalo.de>
Cc: Clive Messer <clive.m.messer(a)gmail.com>
Cc: Matthias Reichl <hias(a)horus.com>
Tested-by: Stefan Wahren <stefan.wahren(a)i2se.com>
Acked-by: Florian Kauer <florian.kauer(a)koalo.de>
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 0c3f5c71bb48..ae10f5614f95 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -406,13 +406,11 @@ static void bcm2835_dma_fill_cb_chain_with_sg(
}
}
-static int bcm2835_dma_abort(void __iomem *chan_base)
+static int bcm2835_dma_abort(struct bcm2835_chan *c)
{
- unsigned long cs;
+ void __iomem *chan_base = c->chan_base;
long int timeout = 10000;
- cs = readl(chan_base + BCM2835_DMA_CS);
-
/*
* A zero control block address means the channel is idle.
* (The ACTIVE flag in the CS register is not a reliable indicator.)
@@ -424,25 +422,16 @@ static int bcm2835_dma_abort(void __iomem *chan_base)
writel(0, chan_base + BCM2835_DMA_CS);
/* Wait for any current AXI transfer to complete */
- while ((cs & BCM2835_DMA_ISPAUSED) && --timeout) {
+ while ((readl(chan_base + BCM2835_DMA_CS) &
+ BCM2835_DMA_WAITING_FOR_WRITES) && --timeout)
cpu_relax();
- cs = readl(chan_base + BCM2835_DMA_CS);
- }
- /* We'll un-pause when we set of our next DMA */
+ /* Peripheral might be stuck and fail to signal AXI write responses */
if (!timeout)
- return -ETIMEDOUT;
-
- if (!(cs & BCM2835_DMA_ACTIVE))
- return 0;
-
- /* Terminate the control block chain */
- writel(0, chan_base + BCM2835_DMA_NEXTCB);
-
- /* Abort the whole DMA */
- writel(BCM2835_DMA_ABORT | BCM2835_DMA_ACTIVE,
- chan_base + BCM2835_DMA_CS);
+ dev_err(c->vc.chan.device->dev,
+ "failed to complete outstanding writes\n");
+ writel(BCM2835_DMA_RESET, chan_base + BCM2835_DMA_CS);
return 0;
}
@@ -787,7 +776,6 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
struct bcm2835_chan *c = to_bcm2835_dma_chan(chan);
struct bcm2835_dmadev *d = to_bcm2835_dma_dev(c->vc.chan.device);
unsigned long flags;
- int timeout = 10000;
LIST_HEAD(head);
spin_lock_irqsave(&c->vc.lock, flags);
@@ -801,18 +789,7 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
if (c->desc) {
vchan_terminate_vdesc(&c->desc->vd);
c->desc = NULL;
- bcm2835_dma_abort(c->chan_base);
-
- /* Wait for stopping */
- while (--timeout) {
- if (!readl(c->chan_base + BCM2835_DMA_ADDR))
- break;
-
- cpu_relax();
- }
-
- if (!timeout)
- dev_err(d->ddev.dev, "DMA transfer could not be terminated\n");
+ bcm2835_dma_abort(c);
}
vchan_get_all_descriptors(&c->vc, &head);
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f7da7782aba92593f7b82f03d2409a1c5f4db91b Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Wed, 23 Jan 2019 09:26:00 +0100
Subject: [PATCH] dmaengine: bcm2835: Fix interrupt race on RT
If IRQ handlers are threaded (either because CONFIG_PREEMPT_RT_BASE is
enabled or "threadirqs" was passed on the command line) and if system
load is sufficiently high that wakeup latency of IRQ threads degrades,
SPI DMA transactions on the BCM2835 occasionally break like this:
ks8851 spi0.0: SPI transfer timed out
bcm2835-dma 3f007000.dma: DMA transfer could not be terminated
ks8851 spi0.0 eth2: ks8851_rdfifo: spi_sync() failed
The root cause is an assumption made by the DMA driver which is
documented in a code comment in bcm2835_dma_terminate_all():
/*
* Stop DMA activity: we assume the callback will not be called
* after bcm_dma_abort() returns (even if it does, it will see
* c->desc is NULL and exit.)
*/
That assumption falls apart if the IRQ handler bcm2835_dma_callback() is
threaded: A client may terminate a descriptor and issue a new one
before the IRQ handler had a chance to run. In fact the IRQ handler may
miss an *arbitrary* number of descriptors. The result is the following
race condition:
1. A descriptor finishes, its interrupt is deferred to the IRQ thread.
2. A client calls dma_terminate_async() which sets channel->desc = NULL.
3. The client issues a new descriptor. Because channel->desc is NULL,
bcm2835_dma_issue_pending() immediately starts the descriptor.
4. Finally the IRQ thread runs and writes BCM2835_DMA_INT to the CS
register to acknowledge the interrupt. This clears the ACTIVE flag,
so the newly issued descriptor is paused in the middle of the
transaction. Because channel->desc is not NULL, the IRQ thread
finalizes the descriptor and tries to start the next one.
I see two possible solutions: The first is to call synchronize_irq()
in bcm2835_dma_issue_pending() to wait until the IRQ thread has
finished before issuing a new descriptor. The downside of this approach
is unnecessary latency if clients desire rapidly terminating and
re-issuing descriptors and don't have any use for an IRQ callback.
(The SPI TX DMA channel is a case in point.)
A better alternative is to make the IRQ thread recognize that it has
missed descriptors and avoid finalizing the newly issued descriptor.
So first of all, set the ACTIVE flag when acknowledging the interrupt.
This keeps a newly issued descriptor running.
If the descriptor was finished, the channel remains idle despite the
ACTIVE flag being set. However the ACTIVE flag can then no longer be
used to check whether the channel is idle, so instead check whether
the register containing the current control block address is zero
and finalize the current descriptor only if so.
That way, there is no impact on latency and throughput if the client
doesn't care for the interrupt: Only minimal additional overhead is
introduced for non-cyclic descriptors as one further MMIO read is
necessary per interrupt to check for idleness of the channel. Cyclic
descriptors are sped up slightly by removing one MMIO write per
interrupt.
Fixes: 96286b576690 ("dmaengine: Add support for BCM2835")
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Cc: stable(a)vger.kernel.org # v3.14+
Cc: Frank Pavlic <f.pavlic(a)kunbus.de>
Cc: Martin Sperl <kernel(a)martin.sperl.org>
Cc: Florian Meier <florian.meier(a)koalo.de>
Cc: Clive Messer <clive.m.messer(a)gmail.com>
Cc: Matthias Reichl <hias(a)horus.com>
Tested-by: Stefan Wahren <stefan.wahren(a)i2se.com>
Acked-by: Florian Kauer <florian.kauer(a)koalo.de>
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 1a44c8086d77..0c3f5c71bb48 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -412,7 +412,12 @@ static int bcm2835_dma_abort(void __iomem *chan_base)
long int timeout = 10000;
cs = readl(chan_base + BCM2835_DMA_CS);
- if (!(cs & BCM2835_DMA_ACTIVE))
+
+ /*
+ * A zero control block address means the channel is idle.
+ * (The ACTIVE flag in the CS register is not a reliable indicator.)
+ */
+ if (!readl(chan_base + BCM2835_DMA_ADDR))
return 0;
/* Write 0 to the active bit - Pause the DMA */
@@ -476,8 +481,15 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
spin_lock_irqsave(&c->vc.lock, flags);
- /* Acknowledge interrupt */
- writel(BCM2835_DMA_INT, c->chan_base + BCM2835_DMA_CS);
+ /*
+ * Clear the INT flag to receive further interrupts. Keep the channel
+ * active in case the descriptor is cyclic or in case the client has
+ * already terminated the descriptor and issued a new one. (May happen
+ * if this IRQ handler is threaded.) If the channel is finished, it
+ * will remain idle despite the ACTIVE flag being set.
+ */
+ writel(BCM2835_DMA_INT | BCM2835_DMA_ACTIVE,
+ c->chan_base + BCM2835_DMA_CS);
d = c->desc;
@@ -485,11 +497,7 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
if (d->cyclic) {
/* call the cyclic callback */
vchan_cyclic_callback(&d->vd);
-
- /* Keep the DMA engine running */
- writel(BCM2835_DMA_ACTIVE,
- c->chan_base + BCM2835_DMA_CS);
- } else {
+ } else if (!readl(c->chan_base + BCM2835_DMA_ADDR)) {
vchan_cookie_complete(&c->desc->vd);
bcm2835_dma_start_desc(c);
}
@@ -789,11 +797,7 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
list_del_init(&c->node);
spin_unlock(&d->lock);
- /*
- * Stop DMA activity: we assume the callback will not be called
- * after bcm_dma_abort() returns (even if it does, it will see
- * c->desc is NULL and exit.)
- */
+ /* stop DMA activity */
if (c->desc) {
vchan_terminate_vdesc(&c->desc->vd);
c->desc = NULL;
@@ -801,8 +805,7 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
/* Wait for stopping */
while (--timeout) {
- if (!(readl(c->chan_base + BCM2835_DMA_CS) &
- BCM2835_DMA_ACTIVE))
+ if (!readl(c->chan_base + BCM2835_DMA_ADDR))
break;
cpu_relax();
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f7da7782aba92593f7b82f03d2409a1c5f4db91b Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Wed, 23 Jan 2019 09:26:00 +0100
Subject: [PATCH] dmaengine: bcm2835: Fix interrupt race on RT
If IRQ handlers are threaded (either because CONFIG_PREEMPT_RT_BASE is
enabled or "threadirqs" was passed on the command line) and if system
load is sufficiently high that wakeup latency of IRQ threads degrades,
SPI DMA transactions on the BCM2835 occasionally break like this:
ks8851 spi0.0: SPI transfer timed out
bcm2835-dma 3f007000.dma: DMA transfer could not be terminated
ks8851 spi0.0 eth2: ks8851_rdfifo: spi_sync() failed
The root cause is an assumption made by the DMA driver which is
documented in a code comment in bcm2835_dma_terminate_all():
/*
* Stop DMA activity: we assume the callback will not be called
* after bcm_dma_abort() returns (even if it does, it will see
* c->desc is NULL and exit.)
*/
That assumption falls apart if the IRQ handler bcm2835_dma_callback() is
threaded: A client may terminate a descriptor and issue a new one
before the IRQ handler had a chance to run. In fact the IRQ handler may
miss an *arbitrary* number of descriptors. The result is the following
race condition:
1. A descriptor finishes, its interrupt is deferred to the IRQ thread.
2. A client calls dma_terminate_async() which sets channel->desc = NULL.
3. The client issues a new descriptor. Because channel->desc is NULL,
bcm2835_dma_issue_pending() immediately starts the descriptor.
4. Finally the IRQ thread runs and writes BCM2835_DMA_INT to the CS
register to acknowledge the interrupt. This clears the ACTIVE flag,
so the newly issued descriptor is paused in the middle of the
transaction. Because channel->desc is not NULL, the IRQ thread
finalizes the descriptor and tries to start the next one.
I see two possible solutions: The first is to call synchronize_irq()
in bcm2835_dma_issue_pending() to wait until the IRQ thread has
finished before issuing a new descriptor. The downside of this approach
is unnecessary latency if clients desire rapidly terminating and
re-issuing descriptors and don't have any use for an IRQ callback.
(The SPI TX DMA channel is a case in point.)
A better alternative is to make the IRQ thread recognize that it has
missed descriptors and avoid finalizing the newly issued descriptor.
So first of all, set the ACTIVE flag when acknowledging the interrupt.
This keeps a newly issued descriptor running.
If the descriptor was finished, the channel remains idle despite the
ACTIVE flag being set. However the ACTIVE flag can then no longer be
used to check whether the channel is idle, so instead check whether
the register containing the current control block address is zero
and finalize the current descriptor only if so.
That way, there is no impact on latency and throughput if the client
doesn't care for the interrupt: Only minimal additional overhead is
introduced for non-cyclic descriptors as one further MMIO read is
necessary per interrupt to check for idleness of the channel. Cyclic
descriptors are sped up slightly by removing one MMIO write per
interrupt.
Fixes: 96286b576690 ("dmaengine: Add support for BCM2835")
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Cc: stable(a)vger.kernel.org # v3.14+
Cc: Frank Pavlic <f.pavlic(a)kunbus.de>
Cc: Martin Sperl <kernel(a)martin.sperl.org>
Cc: Florian Meier <florian.meier(a)koalo.de>
Cc: Clive Messer <clive.m.messer(a)gmail.com>
Cc: Matthias Reichl <hias(a)horus.com>
Tested-by: Stefan Wahren <stefan.wahren(a)i2se.com>
Acked-by: Florian Kauer <florian.kauer(a)koalo.de>
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
index 1a44c8086d77..0c3f5c71bb48 100644
--- a/drivers/dma/bcm2835-dma.c
+++ b/drivers/dma/bcm2835-dma.c
@@ -412,7 +412,12 @@ static int bcm2835_dma_abort(void __iomem *chan_base)
long int timeout = 10000;
cs = readl(chan_base + BCM2835_DMA_CS);
- if (!(cs & BCM2835_DMA_ACTIVE))
+
+ /*
+ * A zero control block address means the channel is idle.
+ * (The ACTIVE flag in the CS register is not a reliable indicator.)
+ */
+ if (!readl(chan_base + BCM2835_DMA_ADDR))
return 0;
/* Write 0 to the active bit - Pause the DMA */
@@ -476,8 +481,15 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
spin_lock_irqsave(&c->vc.lock, flags);
- /* Acknowledge interrupt */
- writel(BCM2835_DMA_INT, c->chan_base + BCM2835_DMA_CS);
+ /*
+ * Clear the INT flag to receive further interrupts. Keep the channel
+ * active in case the descriptor is cyclic or in case the client has
+ * already terminated the descriptor and issued a new one. (May happen
+ * if this IRQ handler is threaded.) If the channel is finished, it
+ * will remain idle despite the ACTIVE flag being set.
+ */
+ writel(BCM2835_DMA_INT | BCM2835_DMA_ACTIVE,
+ c->chan_base + BCM2835_DMA_CS);
d = c->desc;
@@ -485,11 +497,7 @@ static irqreturn_t bcm2835_dma_callback(int irq, void *data)
if (d->cyclic) {
/* call the cyclic callback */
vchan_cyclic_callback(&d->vd);
-
- /* Keep the DMA engine running */
- writel(BCM2835_DMA_ACTIVE,
- c->chan_base + BCM2835_DMA_CS);
- } else {
+ } else if (!readl(c->chan_base + BCM2835_DMA_ADDR)) {
vchan_cookie_complete(&c->desc->vd);
bcm2835_dma_start_desc(c);
}
@@ -789,11 +797,7 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
list_del_init(&c->node);
spin_unlock(&d->lock);
- /*
- * Stop DMA activity: we assume the callback will not be called
- * after bcm_dma_abort() returns (even if it does, it will see
- * c->desc is NULL and exit.)
- */
+ /* stop DMA activity */
if (c->desc) {
vchan_terminate_vdesc(&c->desc->vd);
c->desc = NULL;
@@ -801,8 +805,7 @@ static int bcm2835_dma_terminate_all(struct dma_chan *chan)
/* Wait for stopping */
while (--timeout) {
- if (!(readl(c->chan_base + BCM2835_DMA_CS) &
- BCM2835_DMA_ACTIVE))
+ if (!readl(c->chan_base + BCM2835_DMA_ADDR))
break;
cpu_relax();
Do you have needs for retouching your photos? Or do deep etching and
masking for your photos,
We are the image service provider who can do this for you.
Please send photos to start testing, then you cam judge the quality of our
service.
Thanks,
Cindy
Dorsdten
Bdautzen
One of our servers recently hit a kernel crash and the callstack is:
[6469391.997662] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
[6469392.005693] IP: [<ffffffff811cad80>] shmem_unused_huge_count+0x10/0x20
[6469392.012412] PGD 1000c21067
[6469392.015203] PUD ffc306067
[6469392.018089] PMD 0
[6469392.018627]
[6469392.020303] Oops: 0000 [#1] SMP
[6469392.023621] Modules linked in: kpatch_6iljwh9b(OE) memcg_force_swapin(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl [last unloaded: memcg_force_swapin]
[6469392.040177] CPU: 2 PID: 89058 Comm: ilogtail Tainted: G OE K 4.9.93-010.ali3000.alios7.x86_64 #1
[6469392.049996] Hardware name: Inventec K900-1G /B900G2-1G , BIOS A2.32 10/09/2014
[6469392.060334] task: ffff8802217b1800 task.stack: ffffc9004ea88000
[6469392.066418] RIP: 0010:[<ffffffff811cad80>] [<ffffffff811cad80>] shmem_unused_huge_count+0x10/0x20
[6469392.075563] RSP: 0018:ffffc9004ea8b6c0 EFLAGS: 00010282
[6469392.081041] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000001
[6469392.088339] RDX: 0000000000000001 RSI: ffffc9004ea8b780 RDI: ffff881749bd2000
[6469392.095635] RBP: ffffc9004ea8b6c0 R08: 28f5c28f5c28f5c3 R09: ffff88173bf3fce0
[6469392.102934] R10: ffff88207ffd4000 R11: 0000000000000000 R12: ffff881749bd24c0
[6469392.110233] R13: ffffc9004ea8b780 R14: 0000000000000000 R15: ffff88207ffd4000
[6469392.117533] FS: 00007fe260420700(0000) GS:ffff88103fa80000(0000) knlGS:0000000000000000
[6469392.125792] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[6469392.131703] CR2: 0000000000000070 CR3: 00000005bb46d000 CR4: 00000000001606f0
[6469392.138999] Stack:
[6469392.141185] ffffc9004ea8b6f0 ffffffff81247bee 0000000000000020 0000000000000400
[6469392.148811] ffff881749bd24c0 0000000000000000 ffffc9004ea8b7d0 ffffffff811c431c
[6469392.156436] 0000000000000020 0000000000000000 ffff88207b82c000 0000000000000001
[6469392.164063] Call Trace:
[6469392.166692] [<ffffffff81247bee>] super_cache_count+0x3e/0xe0
[6469392.172607] [<ffffffff811c431c>] shrink_slab.part.38+0x11c/0x420
[6469392.178875] [<ffffffff811c4649>] shrink_slab+0x29/0x30
[6469392.184273] [<ffffffff811c93cf>] shrink_node+0xff/0x300
[6469392.189756] [<ffffffff811c96dd>] do_try_to_free_pages+0x10d/0x330
[6469392.196104] [<ffffffff811c9b65>] try_to_free_mem_cgroup_pages+0xd5/0x1b0
[6469392.203063] [<ffffffff81230b5d>] try_charge+0x14d/0x720
[6469392.208551] [<ffffffff8121b8e3>] ? kmem_cache_alloc+0xd3/0x1a0
[6469392.214642] [<ffffffff811b14e5>] ? mempool_alloc_slab+0x15/0x20
[6469392.220825] [<ffffffff81235b4e>] mem_cgroup_try_charge+0x6e/0x1b0
[6469392.227177] [<ffffffff811ae174>] __add_to_page_cache_locked+0x64/0x220
[6469392.233961] [<ffffffff811ae39e>] add_to_page_cache_lru+0x4e/0xe0
[6469392.240242] [<ffffffffa03ce2d1>] ext4_mpage_readpages+0x151/0x980 [ext4]
[6469392.247211] [<ffffffffa037edb5>] ext4_readpages+0x35/0x40 [ext4]
[6469392.253474] [<ffffffff811be9e7>] __do_page_cache_readahead+0x197/0x240
[6469392.260260] [<ffffffff811ae45c>] ? pagecache_get_page+0x2c/0x2a0
[6469392.266523] [<ffffffff811b0f4b>] filemap_fault+0x4db/0x590
[6469392.272282] [<ffffffffa0388fd6>] ext4_filemap_fault+0x36/0x50 [ext4]
[6469392.278896] [<ffffffff811e4a90>] __do_fault+0x80/0x170
[6469392.284292] [<ffffffff811e87b2>] do_fault+0x4c2/0x720
[6469392.289603] [<ffffffff8111513f>] ? futex_wait_queue_me+0x9f/0x120
[6469392.295954] [<ffffffff811e9162>] handle_mm_fault+0x512/0xc90
[6469392.301874] [<ffffffff8106eb8b>] __do_page_fault+0x24b/0x4d0
[6469392.307796] [<ffffffff811184c5>] ? SyS_futex+0x85/0x170
[6469392.313280] [<ffffffff8106ee40>] do_page_fault+0x30/0x80
[6469392.318850] [<ffffffff81003bf4>] ? do_syscall_64+0x74/0x180
[6469392.324679] [<ffffffff81722b68>] page_fault+0x28/0x30
[6469392.329986] Code: 00 48 83 43 38 01 4c 89 e7 c6 <48> 8b 40 70 5d c3 66 2e 0f 1f 84
[6469392.338183] RIP [<ffffffff811cad80>] shmem_unused_huge_count+0x10/0x20
[6469392.344990] RSP <ffffc9004ea8b6c0>
[6469392.348656] CR2: 0000000000000070
Google showed me Dave Chinner's fix and I think it is the right fix for
our problem(not easy to reproduce in our production environment so I
haven't been able to confirm).
Unfortunately, this commit is only back ported to v4.14 and v4.16 stable
kernel, not v4.9 stable kernel, presumbly due to the rename of MS_BORN
to SB_BORN starting from v4.14. To make this patch work on v4.9, I have
done one minor change to Dave's commit: by keep using MS_BORN. I think
this is correct, but since I know very little about fs code, please
kindly review, thanks a lot for your time.
>From 5cdf1679c9120a173a2bc9dff214332e99f741bc Mon Sep 17 00:00:00 2001
From: Dave Chinner <dchinner(a)redhat.com>
Date: Fri, 11 May 2018 11:20:57 +1000
Subject: [PATCH] fs: don't scan the inode cache before SB_BORN is set
commit 79f546a696bff2590169fb5684e23d65f4d9f591 upstream.
We recently had an oops reported on a 4.14 kernel in
xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
and so the m_perag_tree lookup walked into lala land. It produces
an oops down this path during the failed mount:
radix_tree_gang_lookup_tag+0xc4/0x130
xfs_perag_get_tag+0x37/0xf0
xfs_reclaim_inodes_count+0x32/0x40
xfs_fs_nr_cached_objects+0x11/0x20
super_cache_count+0x35/0xc0
shrink_slab.part.66+0xb1/0x370
shrink_node+0x7e/0x1a0
try_to_free_pages+0x199/0x470
__alloc_pages_slowpath+0x3a1/0xd20
__alloc_pages_nodemask+0x1c3/0x200
cache_grow_begin+0x20b/0x2e0
fallback_alloc+0x160/0x200
kmem_cache_alloc+0x111/0x4e0
The problem is that the superblock shrinker is running before the
filesystem structures it depends on have been fully set up. i.e.
the shrinker is registered in sget(), before ->fill_super() has been
called, and the shrinker can call into the filesystem before
fill_super() does it's setup work. Essentially we are exposed to
both use-after-free and use-before-initialisation bugs here.
To fix this, add a check for the SB_BORN flag in super_cache_count.
In general, this flag is not set until ->fs_mount() completes
successfully, so we know that it is set after the filesystem
setup has completed. This matches the trylock_super() behaviour
which will not let super_cache_scan() run if SB_BORN is not set, and
hence will not allow the superblock shrinker from entering the
filesystem while it is being set up or after it has failed setup
and is being torn down.
Cc: stable(a)kernel.org
Signed-Off-By: Dave Chinner <dchinner(a)redhat.com>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Aaron Lu <aaron.lu(a)linux.alibaba.com>
---
fs/super.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index 7e9beab77259..abe2541fb28c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -119,13 +119,23 @@ static unsigned long super_cache_count(struct shrinker *shrink,
sb = container_of(shrink, struct super_block, s_shrink);
/*
- * Don't call trylock_super as it is a potential
- * scalability bottleneck. The counts could get updated
- * between super_cache_count and super_cache_scan anyway.
- * Call to super_cache_count with shrinker_rwsem held
- * ensures the safety of call to list_lru_shrink_count() and
- * s_op->nr_cached_objects().
+ * We don't call trylock_super() here as it is a scalability bottleneck,
+ * so we're exposed to partial setup state. The shrinker rwsem does not
+ * protect filesystem operations backing list_lru_shrink_count() or
+ * s_op->nr_cached_objects(). Counts can change between
+ * super_cache_count and super_cache_scan, so we really don't need locks
+ * here.
+ *
+ * However, if we are currently mounting the superblock, the underlying
+ * filesystem might be in a state of partial construction and hence it
+ * is dangerous to access it. trylock_super() uses a MS_BORN check to
+ * avoid this situation, so do the same here. The memory barrier is
+ * matched with the one in mount_fs() as we don't hold locks here.
*/
+ if (!(sb->s_flags & MS_BORN))
+ return 0;
+ smp_rmb();
+
if (sb->s_op && sb->s_op->nr_cached_objects)
total_objects = sb->s_op->nr_cached_objects(sb, sc);
@@ -1193,6 +1203,14 @@ mount_fs(struct file_system_type *type, int flags, const char *name, void *data)
sb = root->d_sb;
BUG_ON(!sb);
WARN_ON(!sb->s_bdi);
+
+ /*
+ * Write barrier is for super_cache_count(). We place it before setting
+ * MS_BORN as the data dependency between the two functions is the
+ * superblock structure contents that we just set up, not the MS_BORN
+ * flag.
+ */
+ smp_wmb();
sb->s_flags |= MS_BORN;
error = security_sb_kern_mount(sb, flags, secdata);
--
2.19.1.3.ge56e4f7
This reverts commit 520f08df45fbe300ed650da786a74093d658b7e1
("drm/amdgpu: Correct get_crtc_scanoutpos behavior when vpos >= vtotal")
While the explanation in that commit is correct wrt. the hardware
counter sometimes returning a position >= vtotal in vrr mode if
the query happens while we are inside the "extended front porch",
the math without this commit still returns the desired *vpos value
for achieving a proper vblank timestamping behavior in vrr mode,
according to the kernel documentation about how vblank timestamps
should behave in vrr mode.
According to the vrr mode docs, the vblank timestamp (calculated by
drm_calc_vbltimestamp_from_scanoutpos()) is always supposed to be
the one corresponding to the end of vblank as if vblank duration
is == minimum vblank duration under vrr == vblank duration of the
regular fixed refresh rate mode timing. Iow. it must be the same
timestamp as would show up in non-vrr mode, even if the current
vblank may be much longer (by an unknown amount of time), due to
an extended front porch duration.
Doing the math on this shows that we get the correct *vpos values
to feed into the drm_calc_vbltimestamp_from_scanoutpos() helper
for this to happen by using exactly the same correction:
*vpos = *vpos - vtotal - vbl_end
Therefore we don't need any special case for *vpos >= vtotal and
the special case would cause wrong timestamps.
The only quirk compared to non-vrr mode is that this can return
a *vpos >= 0 despite being in vblank, ie. while returning the
DRM_SCANOUTPOS_IN_VBLANK flag. Auditing all callers of the function
shows this doesn't matter, as they all use the dedicated
DRM_SCANOUTPOS_IN_VBLANK flag to check for "in vblank", but not
the sign of *vpos.
This patch should make sure that amdgpu_display_get_crtc_scanoutpos()
always returns a *vpos value which leads to a stable vblank timestamp,
regardless where the scanout position is during the function call.
This stability is important to not confuse the vblank_disable_immediate
logic in DRM cores vblank counting, and to provide some stable
timestamp for future improvements of the pageflip timestamping under
vrr.
Fixes: 520f08df45fb ("drm/amdgpu: Correct get_crtc_scanoutpos behavior
when vpos >= vtotal")
Signed-off-by: Mario Kleiner <mario.kleiner.de(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 4e944737b708..13da4e3549f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -779,12 +779,22 @@ bool amdgpu_display_crtc_scaling_mode_fixup(struct drm_crtc *crtc,
* Returns vpos as a positive number while in active scanout area.
* Returns vpos as a negative number inside vblank, counting the number
* of scanlines to go until end of vblank, e.g., -1 means "one scanline
- * until start of active scanout / end of vblank."
+ * until start of active scanout / end of vblank." Note that in variable
+ * refresh rate mode (vrr), vpos may be positive inside the extended front
+ * porch, despite being inside vblank, and a negative number does not
+ * always define the number of scanline to true end of vblank. Instead
+ * the vpos values behave as if the crtc would operate in fixed refresh
+ * rate mode. This allows the drm_calc_vbltimestamp_from_scanoutpos()
+ * helper to calculate appropriate and stable vblank timestamps as specified
+ * for vrr mode - corresponding to the shortest vblank duration under vrr.
+ * In vrr mode therefore check the returned Flags for presence of
+ * DRM_SCANOUTPOS_IN_VBLANK to detect if the scanout is currently inside
+ * or outside true vblank.
*
* \return Flags, or'ed together as follows:
*
* DRM_SCANOUTPOS_VALID = Query successful.
- * DRM_SCANOUTPOS_INVBL = Inside vblank.
+ * DRM_SCANOUTPOS_IN_VBLANK = Inside vblank.
* DRM_SCANOUTPOS_ACCURATE = Returned position is accurate. A lack of
* this flag means that returned position may be offset by a constant but
* unknown small number of scanlines wrt. real scanout position.
@@ -876,12 +886,7 @@ int amdgpu_display_get_crtc_scanoutpos(struct drm_device *dev,
/* Inside "upper part" of vblank area? Apply corrective offset if so: */
if (in_vbl && (*vpos >= vbl_start)) {
vtotal = mode->crtc_vtotal;
-
- /* With variable refresh rate displays the vpos can exceed
- * the vtotal value. Clamp to 0 to return -vbl_end instead
- * of guessing the remaining number of lines until scanout.
- */
- *vpos = (*vpos < vtotal) ? (*vpos - vtotal) : 0;
+ *vpos = *vpos - vtotal;
}
/* Correct for shifted end of vbl at vbl_end. */
--
2.17.1