When the allocation process is scheduled back and the mapped hw queue is changed, do one extra wake up on orignal queue for compensating wake up miss, so other allocations on the orignal queue won't be starved.
This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request.
Cc: stable@vger.kernel.org Cc: Omar Sandoval osandov@fb.com Signed-off-by: Ming Lei ming.lei@redhat.com ---
V2: fix build failure
block/blk-mq-tag.c | 13 +++++++++++++ include/linux/sbitmap.h | 7 +++++++ lib/sbitmap.c | 6 ++++++ 3 files changed, 26 insertions(+)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 336dde07b230..77607f89d205 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) ws = bt_wait_ptr(bt, data->hctx); drop_ctx = data->ctx == NULL; do { + struct sbitmap_queue *bt_orig; + /* * We're out of tags on this hardware queue, kick any * pending IO submits before going to sleep waiting for @@ -159,6 +161,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (data->ctx) blk_mq_put_ctx(data->ctx);
+ bt_orig = bt; io_schedule();
data->ctx = blk_mq_get_ctx(data->q); @@ -170,6 +173,16 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) bt = &tags->bitmap_tags;
finish_wait(&ws->wait, &wait); + + /* + * If destination hw queue is changed, wake up original + * queue one extra time for compensating the wake up + * miss, so other allocations on original queue won't + * be starved. + */ + if (bt != bt_orig) + sbitmap_queue_wake_up(bt_orig); + ws = bt_wait_ptr(bt, data->hctx); } while (1);
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 841585f6e5f2..b23f50355281 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -484,6 +484,13 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct sbitmap_queue *sbq, void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
/** + * sbitmap_wake_up() - Do a regular wake up compensation if the queue + * allocated from is changed after scheduling back. + * @sbq: Bitmap queue to wake up. + */ +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq); + +/** * sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct * seq_file. * @sbq: Bitmap queue to show. diff --git a/lib/sbitmap.c b/lib/sbitmap.c index e6a9c06ec70c..c6ae4206bcb1 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -466,6 +466,12 @@ static void sbq_wake_up(struct sbitmap_queue *sbq) } }
+void sbitmap_queue_wake_up(struct sbitmap_queue *sbq) +{ + sbq_wake_up(sbq); +} +EXPORT_SYMBOL_GPL(sbitmap_queue_wake_up); + void sbitmap_queue_clear(struct sbitmap_queue *sbq, unsigned int nr, unsigned int cpu) {
On 5/19/18 1:44 AM, Ming Lei wrote:
When the allocation process is scheduled back and the mapped hw queue is changed, do one extra wake up on orignal queue for compensating wake up miss, so other allocations on the orignal queue won't be starved.
This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request.
Can you explain what the actual bug is? The commit message doesn't really say, it just explains what the code change does. What wake up miss?
On Tue, May 22, 2018 at 02:20:37PM -0600, Jens Axboe wrote:
On 5/19/18 1:44 AM, Ming Lei wrote:
When the allocation process is scheduled back and the mapped hw queue is changed, do one extra wake up on orignal queue for compensating wake up miss, so other allocations on the orignal queue won't be starved.
This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request.
Can you explain what the actual bug is? The commit message doesn't really say, it just explains what the code change does. What wake up miss?
For example:
1) one request queue has 2 queues, and queue depth is low, so wake_batch is set 1 or very small.
2) There are 2 allocations which are blocked on hw queue 0, now the last request mapped to hw queue 0 is completed, then sbq_wake_up() wakes only one of 2 blockers.
3) the waken up allocation process is migrated to another CPU, and its hw queue becomes mapped to hw queue 1, and this allocation is switched to hw queue 1
4) so there isn't any chance for another blocked allocation to move one, since all request in hw queue 0 has been completed. That is why I call it wake up miss.
BTW, this issue can be reproduced reliably by the block/020 I posted yesterday:
https://marc.info/?l=linux-block&m=152698758418536&w=2
Thanks, Ming
On 5/22/18 6:22 PM, Ming Lei wrote:
On Tue, May 22, 2018 at 02:20:37PM -0600, Jens Axboe wrote:
On 5/19/18 1:44 AM, Ming Lei wrote:
When the allocation process is scheduled back and the mapped hw queue is changed, do one extra wake up on orignal queue for compensating wake up miss, so other allocations on the orignal queue won't be starved.
This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request.
Can you explain what the actual bug is? The commit message doesn't really say, it just explains what the code change does. What wake up miss?
For example:
- one request queue has 2 queues, and queue depth is low, so wake_batch
is set 1 or very small.
We have too many 'queues' :)
- There are 2 allocations which are blocked on hw queue 0, now the last
request mapped to hw queue 0 is completed, then sbq_wake_up() wakes only one of 2 blockers.
- the waken up allocation process is migrated to another CPU, and its
hw queue becomes mapped to hw queue 1, and this allocation is switched to hw queue 1
- so there isn't any chance for another blocked allocation to move one,
since all request in hw queue 0 has been completed. That is why I call it wake up miss.
OK, that makes sense to me. With that, let me review your patch in a different email.
On Sat, May 19, 2018 at 03:44:06PM +0800, Ming Lei wrote:
When the allocation process is scheduled back and the mapped hw queue is changed, do one extra wake up on orignal queue for compensating wake up miss, so other allocations on the orignal queue won't be starved.
This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request.
Cc: stable@vger.kernel.org Cc: Omar Sandoval osandov@fb.com Signed-off-by: Ming Lei ming.lei@redhat.com
V2: fix build failure
block/blk-mq-tag.c | 13 +++++++++++++ include/linux/sbitmap.h | 7 +++++++ lib/sbitmap.c | 6 ++++++ 3 files changed, 26 insertions(+)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 336dde07b230..77607f89d205 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) ws = bt_wait_ptr(bt, data->hctx); drop_ctx = data->ctx == NULL; do {
struct sbitmap_queue *bt_orig;
- /*
- We're out of tags on this hardware queue, kick any
- pending IO submits before going to sleep waiting for
@@ -159,6 +161,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (data->ctx) blk_mq_put_ctx(data->ctx);
io_schedule();bt_orig = bt;
data->ctx = blk_mq_get_ctx(data->q); @@ -170,6 +173,16 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) bt = &tags->bitmap_tags; finish_wait(&ws->wait, &wait);
/*
* If destination hw queue is changed, wake up original
* queue one extra time for compensating the wake up
* miss, so other allocations on original queue won't
* be starved.
*/
if (bt != bt_orig)
sbitmap_queue_wake_up(bt_orig);
- ws = bt_wait_ptr(bt, data->hctx); } while (1);
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 841585f6e5f2..b23f50355281 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -484,6 +484,13 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct sbitmap_queue *sbq, void sbitmap_queue_wake_all(struct sbitmap_queue *sbq); /**
- sbitmap_wake_up() - Do a regular wake up compensation if the queue
- allocated from is changed after scheduling back.
- @sbq: Bitmap queue to wake up.
- */
+void sbitmap_queue_wake_up(struct sbitmap_queue *sbq);
+/**
- sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct
- seq_file.
- @sbq: Bitmap queue to show.
diff --git a/lib/sbitmap.c b/lib/sbitmap.c index e6a9c06ec70c..c6ae4206bcb1 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -466,6 +466,12 @@ static void sbq_wake_up(struct sbitmap_queue *sbq) } } +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq) +{
- sbq_wake_up(sbq);
+} +EXPORT_SYMBOL_GPL(sbitmap_queue_wake_up);
There's no reason to wrap an internal helper with a function taking the same arguments, just rename sbq_wake_up() and export it.
On Thu, May 24, 2018 at 1:40 AM, Omar Sandoval osandov@osandov.com wrote:
On Sat, May 19, 2018 at 03:44:06PM +0800, Ming Lei wrote:
When the allocation process is scheduled back and the mapped hw queue is changed, do one extra wake up on orignal queue for compensating wake up miss, so other allocations on the orignal queue won't be starved.
This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request.
Cc: stable@vger.kernel.org Cc: Omar Sandoval osandov@fb.com Signed-off-by: Ming Lei ming.lei@redhat.com
V2: fix build failure
block/blk-mq-tag.c | 13 +++++++++++++ include/linux/sbitmap.h | 7 +++++++ lib/sbitmap.c | 6 ++++++ 3 files changed, 26 insertions(+)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 336dde07b230..77607f89d205 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -134,6 +134,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) ws = bt_wait_ptr(bt, data->hctx); drop_ctx = data->ctx == NULL; do {
struct sbitmap_queue *bt_orig;
/* * We're out of tags on this hardware queue, kick any * pending IO submits before going to sleep waiting for
@@ -159,6 +161,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (data->ctx) blk_mq_put_ctx(data->ctx);
bt_orig = bt; io_schedule(); data->ctx = blk_mq_get_ctx(data->q);
@@ -170,6 +173,16 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) bt = &tags->bitmap_tags;
finish_wait(&ws->wait, &wait);
/*
* If destination hw queue is changed, wake up original
* queue one extra time for compensating the wake up
* miss, so other allocations on original queue won't
* be starved.
*/
if (bt != bt_orig)
sbitmap_queue_wake_up(bt_orig);
ws = bt_wait_ptr(bt, data->hctx); } while (1);
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 841585f6e5f2..b23f50355281 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -484,6 +484,13 @@ static inline struct sbq_wait_state *sbq_wait_ptr(struct sbitmap_queue *sbq, void sbitmap_queue_wake_all(struct sbitmap_queue *sbq);
/**
- sbitmap_wake_up() - Do a regular wake up compensation if the queue
- allocated from is changed after scheduling back.
- @sbq: Bitmap queue to wake up.
- */
+void sbitmap_queue_wake_up(struct sbitmap_queue *sbq);
+/**
- sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct
- seq_file.
- @sbq: Bitmap queue to show.
diff --git a/lib/sbitmap.c b/lib/sbitmap.c index e6a9c06ec70c..c6ae4206bcb1 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -466,6 +466,12 @@ static void sbq_wake_up(struct sbitmap_queue *sbq) } }
+void sbitmap_queue_wake_up(struct sbitmap_queue *sbq) +{
sbq_wake_up(sbq);
+} +EXPORT_SYMBOL_GPL(sbitmap_queue_wake_up);
There's no reason to wrap an internal helper with a function taking the same arguments, just rename sbq_wake_up() and export it.
This way may keep sbq_wake_up() inlined to sbitmap_queue_clear(), so we won't introduce any cost to fast path.
Thanks, Ming Lei
linux-stable-mirror@lists.linaro.org