From: Michael S. Tsirkin mst@redhat.com Sent: 22 August 2025 07:32 PM
On Fri, Aug 22, 2025 at 01:53:02PM +0000, Parav Pandit wrote:
From: Michael S. Tsirkin mst@redhat.com Sent: 22 August 2025 06:35 PM
On Fri, Aug 22, 2025 at 12:24:06PM +0000, Parav Pandit wrote:
From: Li,Rongqing lirongqing@baidu.com Sent: 22 August 2025 03:57 PM
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
Virtio drivers and PCI devices have never fully supported true surprise (aka hot unplug) removal. Drivers historically continued processing and waiting for pending I/O and even continued synchronous device reset during surprise removal. Devices have also continued completing I/Os, doing DMA and allowing device reset after surprise
removal to support such drivers.
Supporting it correctly would require a new device capability and driver negotiation in the virtio specification to safely stop I/O and free queue
memory.
Failure to do so either breaks all the existing drivers with call trace listed in the commit or crashes the host on continuing the
DMA.
Hence, until such specification and devices are invented, restore the previous behavior of treating surprise removal as graceful removal to avoid regressions and maintain system stability same as before the commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
device").
As explained above, previous analysis of solving this only in driver was incomplete and non-reliable at [1] and at [2]; Hence reverting commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") is still the best stand to restore failures of virtio net and block
devices.
[1]
https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB10 0BC6 C6
38 DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t [2] https://lore.kernel.org/virtualization/20250602024358.57114-1- para v@nv idia.c om/
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") Cc: stable@vger.kernel.org Reported-by: lirongqing@baidu.com Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb7 3ca9 b474 1@b aidu.com/ Signed-off-by: Parav Pandit parav@nvidia.com
Tested-by: Li RongQing lirongqing@baidu.com
Thanks
-Li
Multiple users are blocked to have this fix in stable kernel.
what are these users doing that is blocked by this fix?
Not sure I understand the question. Let me try to answer. They are unable to dynamically add/remove the virtio net, block, fs devices in
their systems.
Users have their networking applications running over NS network and
database and file system through these devices.
Some of them keep reverting the patch. Some are unable to. They are in search of stable kernel.
Did I understand your question?
Not really, sorry.
Does the system or does it not have a mechanical interlock?
It is modern system beyond mechanical interlock but has the ability for surprise removal.
If it does, how does a user run into surprise removal issues without the ability to remove the device?
User has the ability to surprise removal a device from the slot via the slot's pci registers. Yet the device is capable enough to fulfil the needs of broken drivers which are waiting for the pending requests to arrive.
If it does not, and a user pull out the working device, how does your patch help?
A driver must tell that it will not follow broken ancient behaviour and at that point device would stop its ancient backward compatibility mode.
-- MST