On Tue, Dec 2, 2025 at 6:10 AM Pratyush Yadav pratyush@kernel.org wrote:
On Mon, Dec 01 2025, Pasha Tatashin wrote:
On Wed, Nov 26, 2025 at 2:36 PM David Matlack dmatlack@google.com wrote:
[...]
FLB Locking
I don't see a way to properly synchronize pci_flb_finish() with pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is dropped by liveupdate_flb_get_incoming() when it returns the pointer to the object, and taking pci_flb_incoming_lock in pci_flb_finish() could result in a deadlock due to reversing the lock ordering.
My mental model for FLB is that it is a dependency for files, so it should always be created (aka prepare) before _any_ of the files, and always destroyed (aka finish) after _all_ of the files.
By the time the FLB is being finished, all the files for that FLB should also be finished, so there should no longer be a user of the FLB.
Once all of the files are finished, it should be LUO's responsibility to make sure liveupdate_flb_get_incoming() returns an error _before_ it starts doing the FLB finish. And in FLB finish you should not be needing to take any locks.
Why do you want to use the FLB when it is being finished?
The next patch looks at the PCI FLB anytime a device is probed, which could could race with the last device file getting finished causing the FLB to be freed.
However, it looks like I am going to drop that patch. But the PCI FLB is still used in PATCH 08 [1] whenever userspace opens a VFIO cdev or issues the VFIO_GROUP_GET_DEVICE_FD ioctl to check of the underlying PCI device was preserved. Offline Jason suggested decoupling those checks from the FLB, so I'll look into doing that in the next version.
[1]https://lore.kernel.org/kvm/20251126193608.2678510-9-dmatlack@google.com/
I will re-introduce _lock/_unlock API to solve this issue.
FLB Retrieving
The first patch of this series includes a fix to prevent an FLB from being retrieved again it is finished. I am wondering if this is the right approach or if subsystems are expected to stop calling liveupdate_flb_get_incoming() after an FLB is finished.
IMO once the FLB is finished, LUO should make sure it cannot be retrieved, mainly so subsystem code is simpler and less bug-prone.
+1, and I think Pasha is going to do that in the next version of FLB.