SCHED_RESTART code path is relied to re-run queue for dispatch requests in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding requests to hctx->dispatch.
memory barriers have to be used for ordering the following two pair of OPs:
1) adding requests to hctx->dispatch and checking SCHED_RESTART in blk_mq_dispatch_rq_list()
2) clearing SCHED_RESTART and checking if there is request in hctx->dispatch in blk_mq_sched_restart().
Without the added memory barrier, either:
1) blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in dispatch side
or
2) blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue in dispatch side, meantime checking if there is request in hctx->dispatch from blk_mq_sched_restart() is missed.
IO hang in ltp/fs_fill test is reported by kernel test robot:
https://lkml.org/lkml/2020/7/26/77
Turns out it is caused by the above out-of-order OPs. And the IO hang can't be observed any more after applying this patch.
Cc: Bart Van Assche bvanassche@acm.org Cc: Christoph Hellwig hch@lst.de Cc: David Jeffery djeffery@redhat.com Reported-by: kernel test robot rong.a.chen@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com --- block/blk-mq-sched.c | 9 +++++++++ block/blk-mq.c | 9 +++++++++ 2 files changed, 18 insertions(+)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index a19cdf159b75..d2790e5b06d1 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -78,6 +78,15 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) return; clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+ /* + * Order clearing SCHED_RESTART and list_empty_careful(&hctx->dispatch) + * in blk_mq_run_hw_queue(). Its pair is the barrier in + * blk_mq_dispatch_rq_list(). So dispatch code won't see SCHED_RESTART, + * meantime new request added to hctx->dispatch is missed to check in + * blk_mq_run_hw_queue(). + */ + smp_mb(); + blk_mq_run_hw_queue(hctx, true); }
diff --git a/block/blk-mq.c b/block/blk-mq.c index 0015a1892153..6c1c3ad175a9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1437,6 +1437,15 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, list_splice_tail_init(list, &hctx->dispatch); spin_unlock(&hctx->lock);
+ /* + * Order adding requests to hctx->dispatch and checking + * SCHED_RESTART flag. The pair of this smp_mb() is the one + * in blk_mq_sched_restart(). Avoid restart code path to + * miss the new added requests to hctx->dispatch, meantime + * SCHED_RESTART is observed here. + */ + smp_mb(); + /* * If SCHED_RESTART was set by the caller of this function and * it is no longer set that means that it was cleared by another
On Mon, Aug 17, 2020 at 06:01:15PM +0800, Ming Lei wrote:
SCHED_RESTART code path is relied to re-run queue for dispatch requests in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding requests to hctx->dispatch.
memory barriers have to be used for ordering the following two pair of OPs:
- adding requests to hctx->dispatch and checking SCHED_RESTART in
blk_mq_dispatch_rq_list()
- clearing SCHED_RESTART and checking if there is request in hctx->dispatch
in blk_mq_sched_restart().
Without the added memory barrier, either:
- blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime
blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in dispatch side
or
- blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue
in dispatch side, meantime checking if there is request in hctx->dispatch from blk_mq_sched_restart() is missed.
IO hang in ltp/fs_fill test is reported by kernel test robot:
https://lkml.org/lkml/2020/7/26/77
Turns out it is caused by the above out-of-order OPs. And the IO hang can't be observed any more after applying this patch.
Cc: Bart Van Assche bvanassche@acm.org Cc: Christoph Hellwig hch@lst.de Cc: David Jeffery djeffery@redhat.com Reported-by: kernel test robot rong.a.chen@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com
Can you add a Fixes: tag so that the commit gets backported?
Otherwise looks good:
Reviewed-by: Christoph Hellwig hch@lst.de
On Mon, Aug 17, 2020 at 12:15:39PM +0200, Christoph Hellwig wrote:
On Mon, Aug 17, 2020 at 06:01:15PM +0800, Ming Lei wrote:
SCHED_RESTART code path is relied to re-run queue for dispatch requests in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding requests to hctx->dispatch.
memory barriers have to be used for ordering the following two pair of OPs:
- adding requests to hctx->dispatch and checking SCHED_RESTART in
blk_mq_dispatch_rq_list()
- clearing SCHED_RESTART and checking if there is request in hctx->dispatch
in blk_mq_sched_restart().
Without the added memory barrier, either:
- blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime
blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in dispatch side
or
- blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue
in dispatch side, meantime checking if there is request in hctx->dispatch from blk_mq_sched_restart() is missed.
IO hang in ltp/fs_fill test is reported by kernel test robot:
https://lkml.org/lkml/2020/7/26/77
Turns out it is caused by the above out-of-order OPs. And the IO hang can't be observed any more after applying this patch.
Cc: Bart Van Assche bvanassche@acm.org Cc: Christoph Hellwig hch@lst.de Cc: David Jeffery djeffery@redhat.com Reported-by: kernel test robot rong.a.chen@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com
Can you add a Fixes: tag so that the commit gets backported?
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Thanks, Ming
On 8/17/20 3:01 AM, Ming Lei wrote:
SCHED_RESTART code path is relied to re-run queue for dispatch requests in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding requests to hctx->dispatch.
memory barriers have to be used for ordering the following two pair of OPs:
- adding requests to hctx->dispatch and checking SCHED_RESTART in
blk_mq_dispatch_rq_list()
- clearing SCHED_RESTART and checking if there is request in hctx->dispatch
in blk_mq_sched_restart().
Without the added memory barrier, either:
- blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime
blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in dispatch side
or
- blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue
in dispatch side, meantime checking if there is request in hctx->dispatch from blk_mq_sched_restart() is missed.
IO hang in ltp/fs_fill test is reported by kernel test robot:
https://lkml.org/lkml/2020/7/26/77
Turns out it is caused by the above out-of-order OPs. And the IO hang can't be observed any more after applying this patch.
Applied, thanks.
Hi
[This is an automated email]
This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.8.2, v5.7.16, v5.4.59, v4.19.140, v4.14.193, v4.9.232, v4.4.232.
v5.8.2: Build OK! v5.7.16: Build OK! v5.4.59: Build OK! v4.19.140: Build OK! v4.14.193: Failed to apply! Possible dependencies: 1f460b63d4b3 ("blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE") 358a3a6bccb7 ("blk-mq: don't handle TAG_SHARED in restart") 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") caf8eb0d604a ("blk-mq-sched: move actual dispatching into one helper") de1482974080 ("blk-mq: introduce .get_budget and .put_budget in blk_mq_ops")
v4.9.232: Failed to apply! Possible dependencies: 8e8320c9315c ("blk-mq: fix performance regression with shared tags") 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") cf43e6be865a ("block: add scalable completion tracking of requests") e806402130c9 ("block: split out request-only flags into a new namespace") f1ba82616c33 ("blk-mq: pass bio to blk_mq_sched_get_rq_priv")
v4.4.232: Failed to apply! Possible dependencies: 511cbce2ff8b ("irq_poll: make blk-iopoll available outside the block layer") 6f3b0e8bcf3c ("blk-mq: add a flags parameter to blk_mq_alloc_request") 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") 8e8320c9315c ("blk-mq: fix performance regression with shared tags") 9467f85960a3 ("blk-mq/cpu-notif: Convert to new hotplug state machine") 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") 9e0e252a048b ("badblocks: Add core badblock management code") bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") cf43e6be865a ("block: add scalable completion tracking of requests") e57690fe009b ("blk-mq: don't overwrite rq->mq_ctx") f1ba82616c33 ("blk-mq: pass bio to blk_mq_sched_get_rq_priv")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
linux-stable-mirror@lists.linaro.org