On Wed, Nov 26, 2025 at 2:36 PM David Matlack dmatlack@google.com wrote:
This series adds the base support to preserve a VFIO device file across a Live Update. "Base support" means that this allows userspace to safetly preserve a VFIO device file with LIVEUPDATE_SESSION_PRESERVE_FD and retrieve a preserved VFIO device file with LIVEUPDATE_SESSION_RETRIEVE_FD, but the device itself is not preserved in a fully running state across Live Update.
This series unblocks 2 parallel but related streams of work:
iommufd preservation across Live Update. This work spans iommufd, the IOMMU subsystem, and IOMMU drivers [1]
Preservation of VFIO device state across Live Update (config space, BAR addresses, power state, SR-IOV state, etc.). This work spans both VFIO and the core PCI subsystem.
While we need all of the above to fully preserve a VFIO device across a Live Update without disrupting the workload on the device, this series aims to be functional and safe enough to merge as the first incremental step toward that goal.
Areas for Discussion
BDF Stability across Live Update
The PCI support for tracking preserved devices across a Live Update to prevent auto-probing relies on PCI segment numbers and BDFs remaining stable. For now I have disallowed VFs, as the BDFs assigned to VFs can vary depending on how the kernel chooses to allocate bus numbers. For non-VFs I am wondering if there is any more needed to ensure BDF stability across Live Update.
While we would like to support many different systems and configurations in due time (including preserving VFs), I'd like to keep this first serses constrained to simple use-cases.
FLB Locking
I don't see a way to properly synchronize pci_flb_finish() with pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is dropped by liveupdate_flb_get_incoming() when it returns the pointer to the object, and taking pci_flb_incoming_lock in pci_flb_finish() could result in a deadlock due to reversing the lock ordering.
I will re-introduce _lock/_unlock API to solve this issue.
FLB Retrieving
The first patch of this series includes a fix to prevent an FLB from being retrieved again it is finished. I am wondering if this is the right approach or if subsystems are expected to stop calling liveupdate_flb_get_incoming() after an FLB is finished.
Thanks, I will include this fix in the next version of FLB.