This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
The cited commit introduced a fix that marks the device as broken during surprise removal. However, this approach causes uncompleted I/O requests on virtio-blk device. The presence of uncompleted I/O requests prevents the successful removal of virtio-blk devices.
This fix allows devices that simulate a surprise removal but actually remove gracefully to continue working as before.
For surprise removals, a better solution will be preferred in the future.
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") Cc: stable@vger.kernel.org Reported-by: lirongqing@baidu.com Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baid... Reviewed-by: Max Gurtovoymgurtovoy@nvidia.com Signed-off-by: Parav Pandit parav@nvidia.com --- drivers/virtio/virtio_pci_common.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index d6d79af44569..dba5eb2eaff9 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct device *dev = get_device(&vp_dev->vdev.dev);
- /* - * Device is marked broken on surprise removal so that virtio upper - * layers can abort any ongoing operation. - */ - if (!pci_device_is_present(pci_dev)) - virtio_break_device(&vp_dev->vdev); - pci_disable_sriov(pci_dev);
unregister_virtio_device(&vp_dev->vdev);
On Tue, Apr 08, 2025 at 05:59:08PM +0300, Parav Pandit wrote:
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
The cited commit introduced a fix that marks the device as broken during surprise removal. However, this approach causes uncompleted I/O requests on virtio-blk device. The presence of uncompleted I/O requests prevents the successful removal of virtio-blk devices.
This fix allows devices that simulate a surprise removal but actually remove gracefully to continue working as before.
For surprise removals, a better solution will be preferred in the future.
Sorry I'm not breaking one thing to fix another. Device is gone so no new requests will be completed. Why not complete all unfinished requests, for example?
Come up with a proper fix pls.
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") Cc: stable@vger.kernel.org Reported-by: lirongqing@baidu.com Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baid... Reviewed-by: Max Gurtovoymgurtovoy@nvidia.com Signed-off-by: Parav Pandit parav@nvidia.com
drivers/virtio/virtio_pci_common.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index d6d79af44569..dba5eb2eaff9 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct device *dev = get_device(&vp_dev->vdev.dev);
- /*
* Device is marked broken on surprise removal so that virtio upper
* layers can abort any ongoing operation.
*/
- if (!pci_device_is_present(pci_dev))
virtio_break_device(&vp_dev->vdev);
- pci_disable_sriov(pci_dev);
unregister_virtio_device(&vp_dev->vdev); -- 2.26.2
Hi Michael,
From: Michael S. Tsirkin mst@redhat.com Sent: Wednesday, April 9, 2025 1:45 AM
On Tue, Apr 08, 2025 at 05:59:08PM +0300, Parav Pandit wrote:
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of
virtio pci device").
The cited commit introduced a fix that marks the device as broken during surprise removal. However, this approach causes uncompleted I/O requests on virtio-blk device. The presence of uncompleted I/O requests prevents the successful removal of virtio-blk devices.
This fix allows devices that simulate a surprise removal but actually remove gracefully to continue working as before.
For surprise removals, a better solution will be preferred in the future.
Sorry I'm not breaking one thing to fix another. Device is gone so no new requests will be completed. Why not complete all unfinished requests, for example?
Come up with a proper fix pls.
I would also like to have a proper fix that can be backportable. However, an attempt [1] had race. To overcome the race, a different approach also tried, however the block layer was stuck even if all requests in virtio-blk driver layer was completed like you suggested.
It appeared that supporting uncompleted requests won't be so straightforward to backport.
Hence, the request is to revert and restore the previous behavior. This at least improves the case where the OS thinks that surprise removal occurred, but the device eventually completes the IO. And hence, virtio block driver successfully unloads. And virtio-net also does not experience the mentioned crash.
[1] https://lore.kernel.org/all/20240217180848.241068-1-parav@nvidia.com/
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") Cc: stable@vger.kernel.org Reported-by: lirongqing@baidu.com Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b474 1@baidu.com/ Reviewed-by: Max Gurtovoymgurtovoy@nvidia.com Signed-off-by: Parav Pandit parav@nvidia.com
drivers/virtio/virtio_pci_common.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index d6d79af44569..dba5eb2eaff9 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev
*pci_dev)
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct device *dev = get_device(&vp_dev->vdev.dev);
/*
* Device is marked broken on surprise removal so that virtio upper
* layers can abort any ongoing operation.
*/
if (!pci_device_is_present(pci_dev))
virtio_break_device(&vp_dev->vdev);
pci_disable_sriov(pci_dev);
unregister_virtio_device(&vp_dev->vdev);
-- 2.26.2
On Wed, Apr 09, 2025 at 01:50:18PM +0000, Parav Pandit wrote:
Hi Michael,
From: Michael S. Tsirkin mst@redhat.com Sent: Wednesday, April 9, 2025 1:45 AM
On Tue, Apr 08, 2025 at 05:59:08PM +0300, Parav Pandit wrote:
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of
virtio pci device").
The cited commit introduced a fix that marks the device as broken during surprise removal. However, this approach causes uncompleted I/O requests on virtio-blk device. The presence of uncompleted I/O requests prevents the successful removal of virtio-blk devices.
This fix allows devices that simulate a surprise removal but actually remove gracefully to continue working as before.
For surprise removals, a better solution will be preferred in the future.
Sorry I'm not breaking one thing to fix another. Device is gone so no new requests will be completed. Why not complete all unfinished requests, for example?
Come up with a proper fix pls.
I would also like to have a proper fix that can be backportable. However, an attempt [1] had race. To overcome the race, a different approach also tried, however the block layer was stuck even if all requests in virtio-blk driver layer was completed like you suggested.
It appeared that supporting uncompleted requests won't be so straightforward to backport.
Hence, the request is to revert and restore the previous behavior. This at least improves the case where the OS thinks that surprise removal occurred, but the device eventually completes the IO. And hence, virtio block driver successfully unloads. And virtio-net also does not experience the mentioned crash.
[1] https://lore.kernel.org/all/20240217180848.241068-1-parav@nvidia.com/
Parav this is a commit from 2021. I am not reverting it "because it seems to help". We'll never make progress like this. You will have to debug and fix it properly. Sorry.
Once we have a fix, we will worry about backports and stuff, this is how we do kernel development here.
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") Cc: stable@vger.kernel.org Reported-by: lirongqing@baidu.com Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b474 1@baidu.com/ Reviewed-by: Max Gurtovoymgurtovoy@nvidia.com Signed-off-by: Parav Pandit parav@nvidia.com
drivers/virtio/virtio_pci_common.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index d6d79af44569..dba5eb2eaff9 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev
*pci_dev)
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct device *dev = get_device(&vp_dev->vdev.dev);
/*
* Device is marked broken on surprise removal so that virtio upper
* layers can abort any ongoing operation.
*/
if (!pci_device_is_present(pci_dev))
virtio_break_device(&vp_dev->vdev);
pci_disable_sriov(pci_dev);
unregister_virtio_device(&vp_dev->vdev);
-- 2.26.2
From: Michael S. Tsirkin mst@redhat.com Sent: Wednesday, April 9, 2025 9:32 PM
On Wed, Apr 09, 2025 at 01:50:18PM +0000, Parav Pandit wrote:
Hi Michael,
From: Michael S. Tsirkin mst@redhat.com Sent: Wednesday, April 9, 2025 1:45 AM
On Tue, Apr 08, 2025 at 05:59:08PM +0300, Parav Pandit wrote:
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of
virtio pci device").
The cited commit introduced a fix that marks the device as broken during surprise removal. However, this approach causes uncompleted I/O requests on virtio-blk device. The presence of uncompleted I/O requests prevents the successful removal of virtio-blk devices.
This fix allows devices that simulate a surprise removal but actually remove gracefully to continue working as before.
For surprise removals, a better solution will be preferred in the future.
Sorry I'm not breaking one thing to fix another. Device is gone so no new requests will be completed. Why not complete all unfinished requests, for example?
Come up with a proper fix pls.
I would also like to have a proper fix that can be backportable. However, an attempt [1] had race. To overcome the race, a different approach also tried, however the block
layer was stuck even if all requests in virtio-blk driver layer was completed like you suggested.
It appeared that supporting uncompleted requests won't be so
straightforward to backport.
Hence, the request is to revert and restore the previous behavior. This at least improves the case where the OS thinks that surprise removal
occurred, but the device eventually completes the IO.
And hence, virtio block driver successfully unloads. And virtio-net also does not experience the mentioned crash.
[1] https://lore.kernel.org/all/20240217180848.241068-1-parav@nvidia.com/
Parav this is a commit from 2021. I am not reverting it "because it seems to help". We'll never make progress like this. You will have to debug and fix it properly. Sorry.
Once we have a fix, we will worry about backports and stuff, this is how we do kernel development here.
Ok. I will post the candidate patch. Will likely need help to fix it. Will ask Stefano. Thanks.
linux-stable-mirror@lists.linaro.org