On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
Virtio drivers and PCI devices have never fully supported true surprise (aka hot unplug) removal. Drivers historically continued processing and waiting for pending I/O and even continued synchronous device reset during surprise removal. Devices have also continued completing I/Os, doing DMA and allowing device reset after surprise removal to support such drivers.
Supporting it correctly would require a new device capability
If a device is removed, it is removed. Windows drivers supported this since forever and it's just a Linux bug that it does not handle all the cases. This is not something you can handle with a capability.
and driver negotiation in the virtio specification to safely stop I/O and free queue memory. Failure to do so either breaks all the existing drivers with call trace listed in the commit or crashes the host on continuing the DMA.
If the device is gone, then DMA does not continue.
IIUC what is going on for you, is that you have developed a surprise removal emulation that pretends to remove the device but actually the device is doing DMA. So of course things break then.
Hence, until such specification and devices are invented, restore the previous behavior of treating surprise removal as graceful removal to avoid regressions and maintain system stability same as before the commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
As explained above, previous analysis of solving this only in driver was incomplete and non-reliable at [1] and at [2]; Hence reverting commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") is still the best stand to restore failures of virtio net and block devices.
[1] https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C638DCBD...
I can only repeat what I said then, this is not how we do kernel development.
[2] https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nvidia.c...
What was missing here, is handling corner cases. So let us please try to handle them.
Here is how I would try to do it:
- add a new driver callback - start a periodic timer task in virtio core on remove - in the timer, probe that the device is still present. if not, invoke a driver callback - cancel the task on device reset
If you do not have the time, let me know and I will try to look into it.
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") Cc: stable@vger.kernel.org Reported-by: lirongqing@baidu.com Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baid... Signed-off-by: Parav Pandit parav@nvidia.com
drivers/virtio/virtio_pci_common.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index d6d79af44569..dba5eb2eaff9 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct device *dev = get_device(&vp_dev->vdev.dev);
- /*
* Device is marked broken on surprise removal so that virtio upper
* layers can abort any ongoing operation.
*/
- if (!pci_device_is_present(pci_dev))
virtio_break_device(&vp_dev->vdev);
- pci_disable_sriov(pci_dev);
unregister_virtio_device(&vp_dev->vdev); -- 2.26.2