RE: [PATCH 3/5] virtio_blk: Fix device surprise removal - Linux-stable-mirror

5 Mar 2024

...
From: Parav Pandit parav@nvidia.com
Sent: Tuesday, March 5, 2024 12:06 PM
When the PCI device is surprise removed, requests won't complete from the
device. These IOs are never completed and disk deletion hangs indefinitely.
Fix it by aborting the IOs which the device will never complete when the VQ is
broken.
With this fix now fio completes swiftly.
An alternative of IO timeout has been considered, however timeout can take
very long time. For unresponsive device, quickly completing the request with
error enables users and upper layer to react quickly.
Verified with multiple device unplug cycles with pending IOs in virtio used ring
and some pending with device.
Also verified without surprise removal.
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
device")
Cc: stable@vger.kernel.org
Reported-by: lirongqing@baidu.com
Closes:
https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741
@baidu.com/
Co-developed-by: Chaitanya Kulkarni kch@nvidia.com
Signed-off-by: Chaitanya Kulkarni kch@nvidia.com
Signed-off-by: Parav Pandit parav@nvidia.com

Please ignore this patch, I am still debugging and verifying it.
Incorrectly it got CCed to stable.
I am sorry for the noise.
...
changelog:
v0->v1:

addressed comments from Ming and Michael
changed the flow to not depend on broken vq anymore to avoid
the race of missing the detection
flow updated to quiesce request queue and device, followed by
syncing with any ongoing interrupt handler or callbacks,
finally finishing the requests which are not completed by device
with error.


drivers/block/virtio_blk.c | 46 +++++++++++++++++++++++++++++++++++-
1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index
2bf14a0e2815..1956172b4b1a 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1562,9 +1562,52 @@ static int virtblk_probe(struct virtio_device *vdev)
   return err;
 }
+static bool virtblk_cancel_request(struct request *rq, void *data) {

struct virtblk_req *vbr = blk_mq_rq_to_pdu(rq);

vbr->in_hdr.status = VIRTIO_BLK_S_IOERR;
if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq))
blk_mq_complete_request(rq);



return true;

+}



static void virtblk_remove(struct virtio_device *vdev)  {
   struct virtio_blk *vblk = vdev->priv;

int i;

/* Block upper layer to not get any new requests */
blk_mq_quiesce_queue(vblk->disk->queue);

mutex_lock(&vblk->vdev_mutex);

/* Stop all the virtqueues and configuration change notification and

also

* synchronize with pending interrupt handlers.


*/


virtio_reset_device(vdev);

mutex_unlock(&vblk->vdev_mutex);

/* Syncronize with any callback handlers for request completion */
for (i = 0; i < vblk->num_vqs; i++)
virtblk_done(vblk->vqs[i].vq);



blk_sync_queue(vblk->disk->queue);

/* At this point block layer and device/transport are quiet;
* hence, safely complete all the pending requests with error.


*/


blk_mq_tagset_busy_iter(&vblk->tag_set, virtblk_cancel_request,

vblk);

blk_mq_tagset_wait_completed_request(&vblk->tag_set);


/*

* Unblock any pending dispatch I/Os before we destroy device. From


* del_gendisk() -> __blk_mark_disk_dead(disk) will set GD_DEAD flag,


* that will make sure any new I/O from bio_queue_enter() to fail.


*/


blk_mq_unquiesce_queue(vblk->disk->queue);
/* Make sure no work handler is accessing the device. */
flush_work(&vblk->config_work);


@@ -1574,9 +1617,6 @@ static void virtblk_remove(struct virtio_device
*vdev)
mutex_lock(&vblk->vdev_mutex);

/* Stop all the virtqueues. */
virtio_reset_device(vdev);
/* Virtqueues are stopped, nothing can use vblk->vdev anymore. */
 vblk->vdev = NULL;

--
2.34.1