On 5/23/25 2:10 AM, Damien Le Moal wrote:
On 5/22/25 19:38, Jens Axboe wrote:
On 5/22/25 11:14 AM, Bart Van Assche wrote:
blk_mq_freeze_queue() never terminates if one or more bios are on the plug list and if the block device driver defines a .submit_bio() method. This is the case for device mapper drivers. The deadlock happens because blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because a queue reference is held by bios on the plug list and because the __bio_queue_enter() call in __submit_bio() waits for the queue to be unfrozen.
This patch fixes the following deadlock:
Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work Call trace: __schedule+0xb08/0x1160 schedule+0x48/0xc8 __bio_queue_enter+0xcc/0x1d0 __submit_bio+0x100/0x1b0 submit_bio_noacct_nocheck+0x230/0x49c blk_zone_wplug_bio_work+0x168/0x250 process_one_work+0x26c/0x65c worker_thread+0x33c/0x498 kthread+0x110/0x134 ret_from_fork+0x10/0x20
Call trace: __switch_to+0x230/0x410 __schedule+0xb08/0x1160 schedule+0x48/0xc8 blk_mq_freeze_queue_wait+0x78/0xb8 blk_mq_freeze_queue+0x90/0xa4 queue_attr_store+0x7c/0xf0 sysfs_kf_write+0x98/0xc8 kernfs_fop_write_iter+0x12c/0x1d4 vfs_write+0x340/0x3ac ksys_write+0x78/0xe8
Cc: Christoph Hellwig hch@lst.de Cc: Damien Le Moal dlemoal@kernel.org Cc: Yu Kuai yukuai1@huaweicloud.com Cc: Ming Lei ming.lei@redhat.com Cc: stable@vger.kernel.org Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Signed-off-by: Bart Van Assche bvanassche@acm.org
Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging() only before submitting the bio and not after it has been submitted.
block/blk-core.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c index b862c66018f2..713fb3865260 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q, return BLK_STS_OK; } +/*
- Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been
- set because this causes blk_mq_freeze_queue() to deadlock if
- blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios
- on the plug list is not necessary since a q_usage_counter reference is held
- while a bio is on the plug list.
- */
static void __submit_bio(struct bio *bio) { /* If plug is not used, add new plug here to cache nsecs time. */ @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio) if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) { blk_mq_submit_bio(bio);
- } else if (likely(bio_queue_enter(bio) == 0)) {
- } else { struct gendisk *disk = bio->bi_bdev->bd_disk;
bool zwp = bio_zone_write_plugging(bio);
if (unlikely(!zwp && bio_queue_enter(bio) != 0))
if ((bio->bi_opf & REQ_POLLED) && !(disk->queue->limits.features & BLK_FEAT_POLL)) {goto finish_plug;
@@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio) } else { disk->fops->submit_bio(bio); }
blk_queue_exit(disk->queue);
if (!zwp)
}blk_queue_exit(disk->queue);
This is pretty ugly, and I honestly absolutely hate how there's quite a bit of zoned_whatever sprinkling throughout the core code. What's the reason for not unplugging here, unaligned writes? Because you should presumable have the exact same issues on non-zoned devices if they have IO stuck in a plug (and doesn't get unplugged) while someone is waiting on a freeze.
A somewhat similar case was solved for IOPOLL and queue entering. That would be another thing to look at. Maybe a live enter could work if the plug itself pins it?
What about this patch, completely untested...
It's exactly the same thing, more zoned sprinkling in code that should not need to care. Only difference is that it moved to a function.