Running the compaction_test sometimes results in out-of-memory
failures. When I debugged this, it turned out that the code to
reset the number of hugepages to the initial value is simply
broken since we write into an open sysctl file descriptor
multiple times without seeking back to the start.
Adding the lseek here fixes the problem.
Cc: stable(a)vger.kernel.org
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Link: https://bugs.linaro.org/show_bug.cgi?id=3145
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
tools/testing/selftests/vm/compaction_test.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/vm/compaction_test.c b/tools/testing/selftests/vm/compaction_test.c
index a65b016d4c13..1097f04e4d80 100644
--- a/tools/testing/selftests/vm/compaction_test.c
+++ b/tools/testing/selftests/vm/compaction_test.c
@@ -137,6 +137,8 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
printf("No of huge pages allocated = %d\n",
(atoi(nr_hugepages)));
+ lseek(fd, 0, SEEK_SET);
+
if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
!= strlen(initial_nr_hugepages)) {
perror("Failed to write value to /proc/sys/vm/nr_hugepages\n");
--
2.9.0
This is a note to let you know that I've just added the patch titled
iw_cxgb4: when flushing, complete all wrs in a chain
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iw_cxgb4-when-flushing-complete-all-wrs-in-a-chain.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d14587334580bc94d3ee11e8320e0c157f91ae8f Mon Sep 17 00:00:00 2001
From: Steve Wise <swise(a)opengridcomputing.com>
Date: Tue, 19 Dec 2017 14:02:10 -0800
Subject: iw_cxgb4: when flushing, complete all wrs in a chain
From: Steve Wise <swise(a)opengridcomputing.com>
commit d14587334580bc94d3ee11e8320e0c157f91ae8f upstream.
If a wr chain was posted and needed to be flushed, only the first
wr in the chain was completed with FLUSHED status. The rest were
never completed. This caused isert to hang on shutdown due to the
missing completions which left iscsi IO commands referenced, stalling
the shutdown.
Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic")
Signed-off-by: Steve Wise <swise(a)opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/cxgb4/qp.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -862,6 +862,22 @@ static int complete_sq_drain_wr(struct c
return 0;
}
+static int complete_sq_drain_wrs(struct c4iw_qp *qhp, struct ib_send_wr *wr,
+ struct ib_send_wr **bad_wr)
+{
+ int ret = 0;
+
+ while (wr) {
+ ret = complete_sq_drain_wr(qhp, wr);
+ if (ret) {
+ *bad_wr = wr;
+ break;
+ }
+ wr = wr->next;
+ }
+ return ret;
+}
+
static void complete_rq_drain_wr(struct c4iw_qp *qhp, struct ib_recv_wr *wr)
{
struct t4_cqe cqe = {};
@@ -894,6 +910,14 @@ static void complete_rq_drain_wr(struct
}
}
+static void complete_rq_drain_wrs(struct c4iw_qp *qhp, struct ib_recv_wr *wr)
+{
+ while (wr) {
+ complete_rq_drain_wr(qhp, wr);
+ wr = wr->next;
+ }
+}
+
int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
struct ib_send_wr **bad_wr)
{
@@ -917,7 +941,7 @@ int c4iw_post_send(struct ib_qp *ibqp, s
*/
if (qhp->wq.flushed) {
spin_unlock_irqrestore(&qhp->lock, flag);
- err = complete_sq_drain_wr(qhp, wr);
+ err = complete_sq_drain_wrs(qhp, wr, bad_wr);
return err;
}
num_wrs = t4_sq_avail(&qhp->wq);
@@ -1066,7 +1090,7 @@ int c4iw_post_receive(struct ib_qp *ibqp
*/
if (qhp->wq.flushed) {
spin_unlock_irqrestore(&qhp->lock, flag);
- complete_rq_drain_wr(qhp, wr);
+ complete_rq_drain_wrs(qhp, wr);
return err;
}
num_wrs = t4_rq_avail(&qhp->wq);
Patches currently in stable-queue which might be from swise(a)opengridcomputing.com are
queue-4.14/iw_cxgb4-only-clear-the-armed-bit-if-a-notification-is-needed.patch
queue-4.14/iw_cxgb4-atomically-flush-the-qp.patch
queue-4.14/iw_cxgb4-when-flushing-complete-all-wrs-in-a-chain.patch
queue-4.14/iw_cxgb4-reflect-the-original-wr-opcode-in-drain-cqes.patch
queue-4.14/iw_cxgb4-only-call-the-cq-comp_handler-when-the-cq-is-armed.patch
This is a note to let you know that I've just added the patch titled
iw_cxgb4: only clear the ARMED bit if a notification is needed
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iw_cxgb4-only-clear-the-armed-bit-if-a-notification-is-needed.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 335ebf6fa35ca1c59b73f76fad19b249d3550e86 Mon Sep 17 00:00:00 2001
From: Steve Wise <swise(a)opengridcomputing.com>
Date: Thu, 30 Nov 2017 09:41:56 -0800
Subject: iw_cxgb4: only clear the ARMED bit if a notification is needed
From: Steve Wise <swise(a)opengridcomputing.com>
commit 335ebf6fa35ca1c59b73f76fad19b249d3550e86 upstream.
In __flush_qp(), the CQ ARMED bit was being cleared regardless of
whether any notification is actually needed. This resulted in the iser
termination logic getting stuck in ib_drain_sq() because the CQ was not
marked ARMED and thus the drain CQE notification wasn't triggered.
This new bug was exposed when this commit was merged:
commit cbb40fadd31c ("iw_cxgb4: only call the cq comp_handler when the
cq is armed")
Signed-off-by: Steve Wise <swise(a)opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/cxgb4/qp.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1301,21 +1301,21 @@ static void __flush_qp(struct c4iw_qp *q
spin_unlock_irqrestore(&rchp->lock, flag);
if (schp == rchp) {
- if (t4_clear_cq_armed(&rchp->cq) &&
- (rq_flushed || sq_flushed)) {
+ if ((rq_flushed || sq_flushed) &&
+ t4_clear_cq_armed(&rchp->cq)) {
spin_lock_irqsave(&rchp->comp_handler_lock, flag);
(*rchp->ibcq.comp_handler)(&rchp->ibcq,
rchp->ibcq.cq_context);
spin_unlock_irqrestore(&rchp->comp_handler_lock, flag);
}
} else {
- if (t4_clear_cq_armed(&rchp->cq) && rq_flushed) {
+ if (rq_flushed && t4_clear_cq_armed(&rchp->cq)) {
spin_lock_irqsave(&rchp->comp_handler_lock, flag);
(*rchp->ibcq.comp_handler)(&rchp->ibcq,
rchp->ibcq.cq_context);
spin_unlock_irqrestore(&rchp->comp_handler_lock, flag);
}
- if (t4_clear_cq_armed(&schp->cq) && sq_flushed) {
+ if (sq_flushed && t4_clear_cq_armed(&schp->cq)) {
spin_lock_irqsave(&schp->comp_handler_lock, flag);
(*schp->ibcq.comp_handler)(&schp->ibcq,
schp->ibcq.cq_context);
Patches currently in stable-queue which might be from swise(a)opengridcomputing.com are
queue-4.14/iw_cxgb4-only-clear-the-armed-bit-if-a-notification-is-needed.patch
queue-4.14/iw_cxgb4-atomically-flush-the-qp.patch
queue-4.14/iw_cxgb4-when-flushing-complete-all-wrs-in-a-chain.patch
queue-4.14/iw_cxgb4-reflect-the-original-wr-opcode-in-drain-cqes.patch
queue-4.14/iw_cxgb4-only-call-the-cq-comp_handler-when-the-cq-is-armed.patch
This is a note to let you know that I've just added the patch titled
iw_cxgb4: atomically flush the qp
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iw_cxgb4-atomically-flush-the-qp.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From bc52e9ca74b9a395897bb640c6671b2cbf716032 Mon Sep 17 00:00:00 2001
From: Steve Wise <swise(a)opengridcomputing.com>
Date: Thu, 9 Nov 2017 07:21:26 -0800
Subject: iw_cxgb4: atomically flush the qp
From: Steve Wise <swise(a)opengridcomputing.com>
commit bc52e9ca74b9a395897bb640c6671b2cbf716032 upstream.
__flush_qp() has a race condition where during the flush operation,
the qp lock is released allowing another thread to possibly post a WR,
which corrupts the queue state, possibly causing crashes. The lock was
released to preserve the cq/qp locking hierarchy of cq first, then qp.
However releasing the qp lock is not necessary; both RQ and SQ CQ locks
can be acquired first, followed by the qp lock, and then the RQ and SQ
flushing can be done w/o unlocking.
Signed-off-by: Steve Wise <swise(a)opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/cxgb4/qp.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1271,31 +1271,34 @@ static void __flush_qp(struct c4iw_qp *q
pr_debug("%s qhp %p rchp %p schp %p\n", __func__, qhp, rchp, schp);
- /* locking hierarchy: cq lock first, then qp lock. */
+ /* locking hierarchy: cqs lock first, then qp lock. */
spin_lock_irqsave(&rchp->lock, flag);
+ if (schp != rchp)
+ spin_lock(&schp->lock);
spin_lock(&qhp->lock);
if (qhp->wq.flushed) {
spin_unlock(&qhp->lock);
+ if (schp != rchp)
+ spin_unlock(&schp->lock);
spin_unlock_irqrestore(&rchp->lock, flag);
return;
}
qhp->wq.flushed = 1;
+ t4_set_wq_in_error(&qhp->wq);
c4iw_flush_hw_cq(rchp);
c4iw_count_rcqes(&rchp->cq, &qhp->wq, &count);
rq_flushed = c4iw_flush_rq(&qhp->wq, &rchp->cq, count);
- spin_unlock(&qhp->lock);
- spin_unlock_irqrestore(&rchp->lock, flag);
- /* locking hierarchy: cq lock first, then qp lock. */
- spin_lock_irqsave(&schp->lock, flag);
- spin_lock(&qhp->lock);
if (schp != rchp)
c4iw_flush_hw_cq(schp);
sq_flushed = c4iw_flush_sq(qhp);
+
spin_unlock(&qhp->lock);
- spin_unlock_irqrestore(&schp->lock, flag);
+ if (schp != rchp)
+ spin_unlock(&schp->lock);
+ spin_unlock_irqrestore(&rchp->lock, flag);
if (schp == rchp) {
if (t4_clear_cq_armed(&rchp->cq) &&
@@ -1329,8 +1332,8 @@ static void flush_qp(struct c4iw_qp *qhp
rchp = to_c4iw_cq(qhp->ibqp.recv_cq);
schp = to_c4iw_cq(qhp->ibqp.send_cq);
- t4_set_wq_in_error(&qhp->wq);
if (qhp->ibqp.uobject) {
+ t4_set_wq_in_error(&qhp->wq);
t4_set_cq_in_error(&rchp->cq);
spin_lock_irqsave(&rchp->comp_handler_lock, flag);
(*rchp->ibcq.comp_handler)(&rchp->ibcq, rchp->ibcq.cq_context);
Patches currently in stable-queue which might be from swise(a)opengridcomputing.com are
queue-4.14/iw_cxgb4-only-clear-the-armed-bit-if-a-notification-is-needed.patch
queue-4.14/iw_cxgb4-atomically-flush-the-qp.patch
queue-4.14/iw_cxgb4-when-flushing-complete-all-wrs-in-a-chain.patch
queue-4.14/iw_cxgb4-reflect-the-original-wr-opcode-in-drain-cqes.patch
queue-4.14/iw_cxgb4-only-call-the-cq-comp_handler-when-the-cq-is-armed.patch
This is a note to let you know that I've just added the patch titled
iw_cxgb4: only call the cq comp_handler when the cq is armed
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iw_cxgb4-only-call-the-cq-comp_handler-when-the-cq-is-armed.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From cbb40fadd31c6bbc59104e58ac95c6ef492d038b Mon Sep 17 00:00:00 2001
From: Steve Wise <swise(a)opengridcomputing.com>
Date: Thu, 9 Nov 2017 07:14:43 -0800
Subject: iw_cxgb4: only call the cq comp_handler when the cq is armed
From: Steve Wise <swise(a)opengridcomputing.com>
commit cbb40fadd31c6bbc59104e58ac95c6ef492d038b upstream.
The ULPs completion handler should only be called if the CQ is
armed for notification.
Signed-off-by: Steve Wise <swise(a)opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/cxgb4/ev.c | 8 +++++---
drivers/infiniband/hw/cxgb4/qp.c | 20 ++++++++++++--------
2 files changed, 17 insertions(+), 11 deletions(-)
--- a/drivers/infiniband/hw/cxgb4/ev.c
+++ b/drivers/infiniband/hw/cxgb4/ev.c
@@ -109,9 +109,11 @@ static void post_qp_event(struct c4iw_de
if (qhp->ibqp.event_handler)
(*qhp->ibqp.event_handler)(&event, qhp->ibqp.qp_context);
- spin_lock_irqsave(&chp->comp_handler_lock, flag);
- (*chp->ibcq.comp_handler)(&chp->ibcq, chp->ibcq.cq_context);
- spin_unlock_irqrestore(&chp->comp_handler_lock, flag);
+ if (t4_clear_cq_armed(&chp->cq)) {
+ spin_lock_irqsave(&chp->comp_handler_lock, flag);
+ (*chp->ibcq.comp_handler)(&chp->ibcq, chp->ibcq.cq_context);
+ spin_unlock_irqrestore(&chp->comp_handler_lock, flag);
+ }
}
void c4iw_ev_dispatch(struct c4iw_dev *dev, struct t4_cqe *err_cqe)
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -817,10 +817,12 @@ static void complete_sq_drain_wr(struct
t4_swcq_produce(cq);
spin_unlock_irqrestore(&schp->lock, flag);
- spin_lock_irqsave(&schp->comp_handler_lock, flag);
- (*schp->ibcq.comp_handler)(&schp->ibcq,
- schp->ibcq.cq_context);
- spin_unlock_irqrestore(&schp->comp_handler_lock, flag);
+ if (t4_clear_cq_armed(&schp->cq)) {
+ spin_lock_irqsave(&schp->comp_handler_lock, flag);
+ (*schp->ibcq.comp_handler)(&schp->ibcq,
+ schp->ibcq.cq_context);
+ spin_unlock_irqrestore(&schp->comp_handler_lock, flag);
+ }
}
static void complete_rq_drain_wr(struct c4iw_qp *qhp, struct ib_recv_wr *wr)
@@ -846,10 +848,12 @@ static void complete_rq_drain_wr(struct
t4_swcq_produce(cq);
spin_unlock_irqrestore(&rchp->lock, flag);
- spin_lock_irqsave(&rchp->comp_handler_lock, flag);
- (*rchp->ibcq.comp_handler)(&rchp->ibcq,
- rchp->ibcq.cq_context);
- spin_unlock_irqrestore(&rchp->comp_handler_lock, flag);
+ if (t4_clear_cq_armed(&rchp->cq)) {
+ spin_lock_irqsave(&rchp->comp_handler_lock, flag);
+ (*rchp->ibcq.comp_handler)(&rchp->ibcq,
+ rchp->ibcq.cq_context);
+ spin_unlock_irqrestore(&rchp->comp_handler_lock, flag);
+ }
}
int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
Patches currently in stable-queue which might be from swise(a)opengridcomputing.com are
queue-4.14/iw_cxgb4-only-clear-the-armed-bit-if-a-notification-is-needed.patch
queue-4.14/iw_cxgb4-atomically-flush-the-qp.patch
queue-4.14/iw_cxgb4-when-flushing-complete-all-wrs-in-a-chain.patch
queue-4.14/iw_cxgb4-reflect-the-original-wr-opcode-in-drain-cqes.patch
queue-4.14/iw_cxgb4-only-call-the-cq-comp_handler-when-the-cq-is-armed.patch
I'm resending with a cover letter as I realize I didn't provide
enough info.
This series fixes critical qp flush bugs in iw_cxgb4. Pleaes apply
to 4.14-stable.
Thanks,
Steve
----
Steve Wise (5):
iw_cxgb4: only call the cq comp_handler when the cq is armed
iw_cxgb4: atomically flush the qp
iw_cxgb4: only clear the ARMED bit if a notification is needed
iw_cxgb4: reflect the original WR opcode in drain cqes
iw_cxgb4: when flushing, complete all wrs in a chain
drivers/infiniband/hw/cxgb4/cq.c | 7 +-
drivers/infiniband/hw/cxgb4/ev.c | 8 ++-
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 2 -
drivers/infiniband/hw/cxgb4/qp.c | 119 ++++++++++++++++++++++++++-------
drivers/infiniband/hw/cxgb4/t4.h | 6 ++
5 files changed, 107 insertions(+), 35 deletions(-)
--
1.8.3.1
When saving BOs in the hang state we skip one entry of the
kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL
pointer dereference when, later in this function, we iterate over all
BOs to check their ->madv state.
Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon(a)free-electrons.com>
---
drivers/gpu/drm/vc4/vc4_gem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index 6c32c89a83a9..19ac7fe0e5db 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c
@@ -208,7 +208,7 @@ vc4_save_hang_state(struct drm_device *dev)
kernel_state->bo[j + prev_idx] = &bo->base.base;
j++;
}
- prev_idx = j + 1;
+ prev_idx = j;
}
if (exec[0])
--
2.11.0