On 5/23/25 04:10, Ming Lei wrote:
On Thu, May 22, 2025 at 11:32:58AM -0700, Bart Van Assche wrote:
On 5/22/25 10:38 AM, Jens Axboe wrote:
On 5/22/25 11:14 AM, Bart Van Assche wrote:
static void __submit_bio(struct bio *bio) { /* If plug is not used, add new plug here to cache nsecs time. */ @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio) if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) { blk_mq_submit_bio(bio);
- } else if (likely(bio_queue_enter(bio) == 0)) {
- } else { struct gendisk *disk = bio->bi_bdev->bd_disk;
bool zwp = bio_zone_write_plugging(bio);
if (unlikely(!zwp && bio_queue_enter(bio) != 0))
if ((bio->bi_opf & REQ_POLLED) && !(disk->queue->limits.features & BLK_FEAT_POLL)) {goto finish_plug;
@@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio) } else { disk->fops->submit_bio(bio); }
blk_queue_exit(disk->queue);
if (!zwp)
}blk_queue_exit(disk->queue);
This is pretty ugly, and I honestly absolutely hate how there's quite a bit of zoned_whatever sprinkling throughout the core code. What's the reason for not unplugging here, unaligned writes? Because you should presumable have the exact same issues on non-zoned devices if they have IO stuck in a plug (and doesn't get unplugged) while someone is waiting on a freeze.
A somewhat similar case was solved for IOPOLL and queue entering. That would be another thing to look at. Maybe a live enter could work if the plug itself pins it?
Hi Jens,
q->q_usage_counter is not increased for bios on current->plug_list. q->q_usage_counter is increased before a bio is added to the zoned pluglist. So these two cases are different.
I think it is important to hold a q->q_usage_counter reference for bios on the zoned plug list because bios are added to that list after bio splitting happened. Hence, request queue limits must not change while any bio is on the zoned plug list.
Hi Bart,
Can you share why request queue limit can't be changed after bio is added to zoned plug list?
Because BIOs on a zone write plug list have already been split according to the current request queue limits. So until these BIOs are executed, we cannot change the limits as that potentially would require again splitting and that would completely messup the zone write pointer tracking of zone write plugging.
If it is really true, we may have to drain zoned plug list when freezing queue.
Yes, that is what we need. But we currently endup deadlocking on a queue freeze because the internal issuing of plugged zone write BIOs uses submit_bio_noacct_nocheck() which calls __submit_bio() and that function will call blk_queue_enter() again for DM device BIOs. We need to somehow cleanly avoid calling that queue enter without sprinkling around lots of zone stuff in this core block layer submission path.
Thanks, Ming