The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 42ffb0bf584ae5b6b38f72259af1e0ee417ac77f Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 23 Jan 2020 15:33:02 -0500
Subject: [PATCH] btrfs: flush write bio if we loop in extent_write_cache_pages
There exists a deadlock with range_cyclic that has existed forever. If
we loop around with a bio already built we could deadlock with a writer
who has the page locked that we're attempting to write but is waiting on
a page in our bio to be written out. The task traces are as follows
PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5"
#0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
#1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
#2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
#3 [ffffc900297bb708] __lock_page at ffffffff811f145b
#4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
#5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
#6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
#7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
#8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
#9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd
PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND:
"aio-dio-invalid"
#0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
#1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
#2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
#3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
#4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
#5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
#6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
#7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
#8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
#9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032
I used drgn to find the respective pages we were stuck on
page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874
As you can see the kworker is waiting for bit 0 (PG_locked) on index
7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
8148. aio-dio-invalid has 7680, and the kworker epd looks like the
following
crash> struct extent_page_data ffffc900297bbbb0
struct extent_page_data {
bio = 0xffff889f747ed830,
tree = 0xffff889eed6ba448,
extent_locked = 0,
sync_io = 0
}
Probably worth mentioning as well that it waits for writeback of the
page to complete while holding a lock on it (at prepare_pages()).
Using drgn I walked the bio pages looking for page
0xffffea00fbfc7500 which is the one we're waiting for writeback on
bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
for i in range(0, bio.bi_vcnt.value_()):
bv = bio.bi_io_vec[i]
if bv.bv_page.value_() == 0xffffea00fbfc7500:
print("FOUND IT")
which validated what I suspected.
The fix for this is simple, flush the epd before we loop back around to
the beginning of the file during writeout.
Fixes: b293f02e1423 ("Btrfs: Add writepages support")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e2d30287e2d5..8ff17bc30d5a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4166,7 +4166,16 @@ static int extent_write_cache_pages(struct address_space *mapping,
*/
scanned = 1;
index = 0;
- goto retry;
+
+ /*
+ * If we're looping we could run into a page that is locked by a
+ * writer and that writer could be waiting on writeback for a
+ * page in our current bio, and thus deadlock, so flush the
+ * write bio here.
+ */
+ ret = flush_write_bio(epd);
+ if (!ret)
+ goto retry;
}
if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f0b493e6b9a8959356983f57112229e69c2f7b8c Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Sat, 1 Feb 2020 21:30:11 -0700
Subject: [PATCH] io_uring: prevent potential eventfd recursion on poll
If we have nested or circular eventfd wakeups, then we can deadlock if
we run them inline from our poll waitqueue wakeup handler. It's also
possible to have very long chains of notifications, to the extent where
we could risk blowing the stack.
Check the eventfd recursion count before calling eventfd_signal(). If
it's non-zero, then punt the signaling to async context. This is always
safe, as it takes us out-of-line in terms of stack and locking context.
Cc: stable(a)vger.kernel.org # 5.1+
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 217721c7bc41..43f3b7d90299 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1020,21 +1020,28 @@ static struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx)
static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx)
{
+ if (!ctx->cq_ev_fd)
+ return false;
if (!ctx->eventfd_async)
return true;
return io_wq_current_is_worker() || in_interrupt();
}
-static void io_cqring_ev_posted(struct io_ring_ctx *ctx)
+static void __io_cqring_ev_posted(struct io_ring_ctx *ctx, bool trigger_ev)
{
if (waitqueue_active(&ctx->wait))
wake_up(&ctx->wait);
if (waitqueue_active(&ctx->sqo_wait))
wake_up(&ctx->sqo_wait);
- if (ctx->cq_ev_fd && io_should_trigger_evfd(ctx))
+ if (trigger_ev)
eventfd_signal(ctx->cq_ev_fd, 1);
}
+static void io_cqring_ev_posted(struct io_ring_ctx *ctx)
+{
+ __io_cqring_ev_posted(ctx, io_should_trigger_evfd(ctx));
+}
+
/* Returns true if there are no backlogged entries after the flush */
static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
{
@@ -3561,6 +3568,14 @@ static void io_poll_flush(struct io_wq_work **workptr)
__io_poll_flush(req->ctx, nodes);
}
+static void io_poll_trigger_evfd(struct io_wq_work **workptr)
+{
+ struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
+
+ eventfd_signal(req->ctx->cq_ev_fd, 1);
+ io_put_req(req);
+}
+
static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
void *key)
{
@@ -3586,14 +3601,22 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
if (llist_empty(&ctx->poll_llist) &&
spin_trylock_irqsave(&ctx->completion_lock, flags)) {
+ bool trigger_ev;
+
hash_del(&req->hash_node);
io_poll_complete(req, mask, 0);
- req->flags |= REQ_F_COMP_LOCKED;
- io_put_req(req);
- spin_unlock_irqrestore(&ctx->completion_lock, flags);
- io_cqring_ev_posted(ctx);
- req = NULL;
+ trigger_ev = io_should_trigger_evfd(ctx);
+ if (trigger_ev && eventfd_signal_count()) {
+ trigger_ev = false;
+ req->work.func = io_poll_trigger_evfd;
+ } else {
+ req->flags |= REQ_F_COMP_LOCKED;
+ io_put_req(req);
+ req = NULL;
+ }
+ spin_unlock_irqrestore(&ctx->completion_lock, flags);
+ __io_cqring_ev_posted(ctx, trigger_ev);
} else {
req->result = mask;
req->llist_node.next = NULL;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d51224b73d18d207912f15ad4eb7a4b456682729 Mon Sep 17 00:00:00 2001
From: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Date: Wed, 11 Dec 2019 12:47:57 +0100
Subject: [PATCH] media: cec: check 'transmit_in_progress', not 'transmitting'
Currently wait_event_interruptible_timeout is called in cec_thread_func()
when adap->transmitting is set. But if the adapter is unconfigured
while transmitting, then adap->transmitting is set to NULL. But the
hardware is still actually transmitting the message, and that's
indicated by adap->transmit_in_progress and we should wait until that
is finished or times out before transmitting new messages.
As the original commit says: adap->transmitting is the userspace view,
adap->transmit_in_progress reflects the hardware state.
However, if adap->transmitting is NULL and adap->transmit_in_progress
is true, then wait_event_interruptible is called (no timeout), which
can get stuck indefinitely if the CEC driver is flaky and never marks
the transmit-in-progress as 'done'.
So test against transmit_in_progress when deciding whether to use
the timeout variant or not, instead of testing against adap->transmitting.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Fixes: 32804fcb612b ("media: cec: keep track of outstanding transmits")
Cc: <stable(a)vger.kernel.org> # for v4.19 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/cec/cec-adap.c b/drivers/media/cec/cec-adap.c
index 1060e633b623..6c95dc471d4c 100644
--- a/drivers/media/cec/cec-adap.c
+++ b/drivers/media/cec/cec-adap.c
@@ -465,7 +465,7 @@ int cec_thread_func(void *_adap)
bool timeout = false;
u8 attempts;
- if (adap->transmitting) {
+ if (adap->transmit_in_progress) {
int err;
/*
@@ -500,7 +500,7 @@ int cec_thread_func(void *_adap)
goto unlock;
}
- if (adap->transmitting && timeout) {
+ if (adap->transmit_in_progress && timeout) {
/*
* If we timeout, then log that. Normally this does
* not happen and it is an indication of a faulty CEC
@@ -509,14 +509,18 @@ int cec_thread_func(void *_adap)
* so much traffic on the bus that the adapter was
* unable to transmit for CEC_XFER_TIMEOUT_MS (2.1s).
*/
- pr_warn("cec-%s: message %*ph timed out\n", adap->name,
- adap->transmitting->msg.len,
- adap->transmitting->msg.msg);
+ if (adap->transmitting) {
+ pr_warn("cec-%s: message %*ph timed out\n", adap->name,
+ adap->transmitting->msg.len,
+ adap->transmitting->msg.msg);
+ /* Just give up on this. */
+ cec_data_cancel(adap->transmitting,
+ CEC_TX_STATUS_TIMEOUT);
+ } else {
+ pr_warn("cec-%s: transmit timed out\n", adap->name);
+ }
adap->transmit_in_progress = false;
adap->tx_timeouts++;
- /* Just give up on this. */
- cec_data_cancel(adap->transmitting,
- CEC_TX_STATUS_TIMEOUT);
goto unlock;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d51224b73d18d207912f15ad4eb7a4b456682729 Mon Sep 17 00:00:00 2001
From: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Date: Wed, 11 Dec 2019 12:47:57 +0100
Subject: [PATCH] media: cec: check 'transmit_in_progress', not 'transmitting'
Currently wait_event_interruptible_timeout is called in cec_thread_func()
when adap->transmitting is set. But if the adapter is unconfigured
while transmitting, then adap->transmitting is set to NULL. But the
hardware is still actually transmitting the message, and that's
indicated by adap->transmit_in_progress and we should wait until that
is finished or times out before transmitting new messages.
As the original commit says: adap->transmitting is the userspace view,
adap->transmit_in_progress reflects the hardware state.
However, if adap->transmitting is NULL and adap->transmit_in_progress
is true, then wait_event_interruptible is called (no timeout), which
can get stuck indefinitely if the CEC driver is flaky and never marks
the transmit-in-progress as 'done'.
So test against transmit_in_progress when deciding whether to use
the timeout variant or not, instead of testing against adap->transmitting.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Fixes: 32804fcb612b ("media: cec: keep track of outstanding transmits")
Cc: <stable(a)vger.kernel.org> # for v4.19 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/cec/cec-adap.c b/drivers/media/cec/cec-adap.c
index 1060e633b623..6c95dc471d4c 100644
--- a/drivers/media/cec/cec-adap.c
+++ b/drivers/media/cec/cec-adap.c
@@ -465,7 +465,7 @@ int cec_thread_func(void *_adap)
bool timeout = false;
u8 attempts;
- if (adap->transmitting) {
+ if (adap->transmit_in_progress) {
int err;
/*
@@ -500,7 +500,7 @@ int cec_thread_func(void *_adap)
goto unlock;
}
- if (adap->transmitting && timeout) {
+ if (adap->transmit_in_progress && timeout) {
/*
* If we timeout, then log that. Normally this does
* not happen and it is an indication of a faulty CEC
@@ -509,14 +509,18 @@ int cec_thread_func(void *_adap)
* so much traffic on the bus that the adapter was
* unable to transmit for CEC_XFER_TIMEOUT_MS (2.1s).
*/
- pr_warn("cec-%s: message %*ph timed out\n", adap->name,
- adap->transmitting->msg.len,
- adap->transmitting->msg.msg);
+ if (adap->transmitting) {
+ pr_warn("cec-%s: message %*ph timed out\n", adap->name,
+ adap->transmitting->msg.len,
+ adap->transmitting->msg.msg);
+ /* Just give up on this. */
+ cec_data_cancel(adap->transmitting,
+ CEC_TX_STATUS_TIMEOUT);
+ } else {
+ pr_warn("cec-%s: transmit timed out\n", adap->name);
+ }
adap->transmit_in_progress = false;
adap->tx_timeouts++;
- /* Just give up on this. */
- cec_data_cancel(adap->transmitting,
- CEC_TX_STATUS_TIMEOUT);
goto unlock;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4be77174c3fafc450527a0c383bb2034d2db6a2d Mon Sep 17 00:00:00 2001
From: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Date: Sat, 7 Dec 2019 23:48:09 +0100
Subject: [PATCH] media: cec: avoid decrementing transmit_queue_sz if it is 0
WARN if transmit_queue_sz is 0 but do not decrement it.
The CEC adapter will become unresponsive if it goes below
0 since then it thinks there are 4 billion messages in the
queue.
Obviously this should not happen, but a driver bug could
cause this.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Cc: <stable(a)vger.kernel.org> # for v4.12 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/cec/cec-adap.c b/drivers/media/cec/cec-adap.c
index e90c30dac68b..1060e633b623 100644
--- a/drivers/media/cec/cec-adap.c
+++ b/drivers/media/cec/cec-adap.c
@@ -380,7 +380,8 @@ static void cec_data_cancel(struct cec_data *data, u8 tx_status)
} else {
list_del_init(&data->list);
if (!(data->msg.tx_status & CEC_TX_STATUS_OK))
- data->adap->transmit_queue_sz--;
+ if (!WARN_ON(!data->adap->transmit_queue_sz))
+ data->adap->transmit_queue_sz--;
}
if (data->msg.tx_status & CEC_TX_STATUS_OK) {
@@ -432,6 +433,14 @@ static void cec_flush(struct cec_adapter *adap)
* need to do anything special in that case.
*/
}
+ /*
+ * If something went wrong and this counter isn't what it should
+ * be, then this will reset it back to 0. Warn if it is not 0,
+ * since it indicates a bug, either in this framework or in a
+ * CEC driver.
+ */
+ if (WARN_ON(adap->transmit_queue_sz))
+ adap->transmit_queue_sz = 0;
}
/*
@@ -522,7 +531,8 @@ int cec_thread_func(void *_adap)
data = list_first_entry(&adap->transmit_queue,
struct cec_data, list);
list_del_init(&data->list);
- adap->transmit_queue_sz--;
+ if (!WARN_ON(!data->adap->transmit_queue_sz))
+ adap->transmit_queue_sz--;
/* Make this the current transmitting message */
adap->transmitting = data;