On 5/22/25 20:32, Bart Van Assche wrote:
On 5/22/25 10:38 AM, Jens Axboe wrote:
On 5/22/25 11:14 AM, Bart Van Assche wrote:
static void __submit_bio(struct bio *bio) { /* If plug is not used, add new plug here to cache nsecs time. */ @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio) if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) { blk_mq_submit_bio(bio);
- } else if (likely(bio_queue_enter(bio) == 0)) {
- } else { struct gendisk *disk = bio->bi_bdev->bd_disk;
bool zwp = bio_zone_write_plugging(bio);
if (unlikely(!zwp && bio_queue_enter(bio) != 0))
if ((bio->bi_opf & REQ_POLLED) && !(disk->queue->limits.features & BLK_FEAT_POLL)) {goto finish_plug;
@@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio) } else { disk->fops->submit_bio(bio); }
blk_queue_exit(disk->queue);
if (!zwp)
}blk_queue_exit(disk->queue);
This is pretty ugly, and I honestly absolutely hate how there's quite a bit of zoned_whatever sprinkling throughout the core code. What's the reason for not unplugging here, unaligned writes? Because you should presumable have the exact same issues on non-zoned devices if they have IO stuck in a plug (and doesn't get unplugged) while someone is waiting on a freeze.
A somewhat similar case was solved for IOPOLL and queue entering. That would be another thing to look at. Maybe a live enter could work if the plug itself pins it?
Hi Jens,
q->q_usage_counter is not increased for bios on current->plug_list. q->q_usage_counter is increased before a bio is added to the zoned pluglist. So these two cases are different.
I think it is important to hold a q->q_usage_counter reference for bios on the zoned plug list because bios are added to that list after bio splitting happened. Hence, request queue limits must not change while any bio is on the zoned plug list.
Damien, do you think it would be possible to add a bio to the zoned plug list before it is split and not to hold q->q_usage_counter for bios on the zoned plug list?
I do not think this is the right thing to do as that will completely mess up the queue usage counter value with regard to submit_bio() calls. That value is incremented when a BIO is submitted, and that should stay the same for zoned write BIOs that endup in a zone write plug instead of going straigth to the device.
The issue comes from the fact that blk_zone_wplug_bio_work() calls submit_bio_noacct_nocheck() which eventually calls __submit_bio() and that function calls blk_queue_enter() for DM devices. We still need to preserve that for the first submission of the BIO but should not/do not need to do the blk_queue_enter() call again when a BIO is submitted via the zone write plug work. I am not sure yet how to best deal with that.
I am traveling today and will not have time to cook something. Will take a look over the weekend.