This patch on itself does not change the behavior of either ioctl. However, this patch is necessary to avoid that these ioctls fail with -EIO if sd_revalidate_disk() is called while these ioctls are in progress because the current zoned block command code temporarily clears data that is needed by these ioctls. See also commit 3ed05a987e0f ("blk-zoned: implement ioctls").
Signed-off-by: Bart Van Assche bart.vanassche@wdc.com Cc: Martin K. Petersen martin.petersen@oracle.com Cc: Damien Le Moal damien.lemoal@wdc.com Cc: Christoph Hellwig hch@lst.de Cc: Hannes Reinecke hare@suse.com Cc: Shaun Tancheff shaun@tancheff.com Cc: stable@vger.kernel.org # v4.10 --- block/blk-zoned.c | 42 +++++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 20bfc37e1852..acc71e8c3473 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -127,15 +127,19 @@ int blkdev_report_zones(struct block_device *bdev, if (!q) return -ENXIO;
+ blk_queue_enter(q, 0); + + ret = -EOPNOTSUPP; if (!blk_queue_is_zoned(q)) - return -EOPNOTSUPP; + goto exit_queue;
+ ret = 0; if (!nrz) - return 0; + goto exit_queue;
if (sector > bdev->bd_part->nr_sects) { *nr_zones = 0; - return 0; + goto exit_queue; }
/* @@ -154,9 +158,10 @@ int blkdev_report_zones(struct block_device *bdev, nr_pages = min_t(unsigned int, nr_pages, queue_max_segments(q));
+ ret = -ENOMEM; bio = bio_alloc(gfp_mask, nr_pages); if (!bio) - return -ENOMEM; + goto exit_queue;
bio_set_dev(bio, bdev); bio->bi_iter.bi_sector = blk_zone_start(q, sector); @@ -166,7 +171,7 @@ int blkdev_report_zones(struct block_device *bdev, page = alloc_page(gfp_mask); if (!page) { ret = -ENOMEM; - goto out; + goto put_bio; } if (!bio_add_page(bio, page, PAGE_SIZE, 0)) { __free_page(page); @@ -179,7 +184,7 @@ int blkdev_report_zones(struct block_device *bdev, else ret = submit_bio_wait(bio); if (ret) - goto out; + goto put_bio;
/* * Process the report result: skip the header and go through the @@ -222,11 +227,14 @@ int blkdev_report_zones(struct block_device *bdev, }
*nr_zones = nz; -out: +put_bio: bio_for_each_segment_all(bv, bio, i) __free_page(bv->bv_page); bio_put(bio);
+exit_queue: + blk_queue_exit(q); + return ret; } EXPORT_SYMBOL_GPL(blkdev_report_zones); @@ -256,21 +264,25 @@ int blkdev_reset_zones(struct block_device *bdev, if (!q) return -ENXIO;
+ blk_queue_enter(q, 0); + + ret = -EOPNOTSUPP; if (!blk_queue_is_zoned(q)) - return -EOPNOTSUPP; + goto out;
+ ret = -EINVAL; if (end_sector > bdev->bd_part->nr_sects) /* Out of range */ - return -EINVAL; + goto out;
/* Check alignment (handle eventual smaller last zone) */ zone_sectors = blk_queue_zone_sectors(q); if (sector & (zone_sectors - 1)) - return -EINVAL; + goto out;
if ((nr_sectors & (zone_sectors - 1)) && end_sector != bdev->bd_part->nr_sects) - return -EINVAL; + goto out;
while (sector < end_sector) {
@@ -283,7 +295,7 @@ int blkdev_reset_zones(struct block_device *bdev, bio_put(bio);
if (ret) - return ret; + goto out;
sector += zone_sectors;
@@ -292,7 +304,11 @@ int blkdev_reset_zones(struct block_device *bdev,
}
- return 0; + ret = 0; + +out: + blk_queue_exit(q); + return ret; } EXPORT_SYMBOL_GPL(blkdev_reset_zones);
On Mon, Apr 16, 2018 at 06:00:34PM -0700, Bart Van Assche wrote:
This patch on itself does not change the behavior of either ioctl. However, this patch is necessary to avoid that these ioctls fail with -EIO if sd_revalidate_disk() is called while these ioctls are in progress because the current zoned block command code temporarily clears data that is needed by these ioctls. See also commit 3ed05a987e0f ("blk-zoned: implement ioctls").
Hmm. I think we need to avoid clearing that data and update it using RCU instead. Calling blk_queue_enter before submitting bios is something that would make zone reporting very different from any other block layer user.
On Tue, 2018-04-17 at 17:18 +0200, Christoph Hellwig wrote:
On Mon, Apr 16, 2018 at 06:00:34PM -0700, Bart Van Assche wrote:
This patch on itself does not change the behavior of either ioctl. However, this patch is necessary to avoid that these ioctls fail with -EIO if sd_revalidate_disk() is called while these ioctls are in progress because the current zoned block command code temporarily clears data that is needed by these ioctls. See also commit 3ed05a987e0f ("blk-zoned: implement ioctls").
Hmm. I think we need to avoid clearing that data and update it using RCU instead. Calling blk_queue_enter before submitting bios is something that would make zone reporting very different from any other block layer user.
Hello Christoph,
As you know some struct members that contain zoned block device information are in struct request queue and others are in struct scsi_disk:
struct scsi_disk { [ ... ] #ifdef CONFIG_BLK_DEV_ZONED u32 nr_zones; u32 zone_blocks; u32 zone_shift; u32 zones_optimal_open; u32 zones_optimal_nonseq; u32 zones_max_open; #endif [ ... ] };
struct request_queue { [ ... ] unsigned int nr_zones; unsigned long *seq_zones_bitmap; unsigned long *seq_zones_wlock; [ ... ] };
Did you perhaps mean to move these members into two new structures, to use RCU to update the pointers to these structures and also to protect code that reads these pointers with rcu_read_lock() / rcu_read_unlock()? If so, how to make sure that both pointers get updated simultaneously such that no code ever sees a pointer to the old information in e.g. struct scsi_disk and a pointer to the new information in struct request_queue?
Thanks,
Bart.
On Tue, 2018-04-17 at 17:18 +0200, Christoph Hellwig wrote:
On Mon, Apr 16, 2018 at 06:00:34PM -0700, Bart Van Assche wrote:
This patch on itself does not change the behavior of either ioctl. However, this patch is necessary to avoid that these ioctls fail with -EIO if sd_revalidate_disk() is called while these ioctls are in progress because the current zoned block command code temporarily clears data that is needed by these ioctls. See also commit 3ed05a987e0f ("blk-zoned: implement ioctls").
Hmm. I think we need to avoid clearing that data and update it using RCU instead. Calling blk_queue_enter before submitting bios is something that would make zone reporting very different from any other block layer user.
Hello Christoph,
Please have a look at generic_make_request(). I think that function shows that blk_queue_enter() is called anyway before .make_request_fn() is called.
Thanks,
Bart.
On Tue, Apr 17, 2018 at 05:35:07PM +0000, Bart Van Assche wrote:
Hmm. I think we need to avoid clearing that data and update it using RCU instead. Calling blk_queue_enter before submitting bios is something that would make zone reporting very different from any other block layer user.
Hello Christoph,
Please have a look at generic_make_request(). I think that function shows that blk_queue_enter() is called anyway before .make_request_fn() is called.
Yes, that is the point. We already call blk_queue_enter in generic_make_request, which the zone report and reset ops pass through.
Christoph,
On 2018/04/17 8:18, Christoph Hellwig wrote:
On Mon, Apr 16, 2018 at 06:00:34PM -0700, Bart Van Assche wrote:
This patch on itself does not change the behavior of either ioctl. However, this patch is necessary to avoid that these ioctls fail with -EIO if sd_revalidate_disk() is called while these ioctls are in progress because the current zoned block command code temporarily clears data that is needed by these ioctls. See also commit 3ed05a987e0f ("blk-zoned: implement ioctls").
Hmm. I think we need to avoid clearing that data and update it using RCU instead. Calling blk_queue_enter before submitting bios is something that would make zone reporting very different from any other block layer user.
With the scsi patches that Bart sent, the data is not cleared anymore until it is ready to be updated all at once, in between calls to blk_mq_freeze_queue() and blk_mq_unfreeze_queue(). This prevents the scheduler (deadline and mq-deadline) from using partially incorrect data since the queue freeze guarantees no activity on the scheduler side during the data swap. But for the BIO issuing side, the calls to blk_queue_enter() ensures the same.
This all allows to leave unmodified all the blk-zoned.c helper functions that use the request queue information (bitmaps and number of zones), without any need for rcu lock calls. I think this is a simpler approach.
Best regards.
linux-stable-mirror@lists.linaro.org