On Tue, Jun 16, 2026 at 09:48:14AM +0000, Tian, Kevin wrote:
From: Pranjal Shrivastava praan@google.com Sent: Saturday, June 13, 2026 3:39 AM
On Wed, Jun 10, 2026 at 04:43:20PM +0100, Matt Evans wrote:
@@ -1264,7 +1265,7 @@ static int vfio_pci_ioctl_reset(struct
vfio_pci_core_device *vdev,
if (!vdev->reset_works) return -EINVAL;
- vfio_pci_zap_and_down_write_memory_lock(vdev);
down_write(&vdev->memory_lock);
/*
- This function can be invoked while the power state is non-D0. If
@@ -1277,10 +1278,11 @@ static int vfio_pci_ioctl_reset(struct
vfio_pci_core_device *vdev,
*/vfio_pci_set_power_state(vdev, PCI_D0);
- vfio_pci_dma_buf_move(vdev, true);
- vfio_pci_zap_revoke_bars(vdev);
I'm wondering if this change in behavior is correct? BEFORE this patch the sequence was:
- zap vma mappings
- Enter D0
After this patch the sequence becomes
- Take the lock
- Enter D0
- zap vma mappings
My worry is if user-space accesses a BAR *during* the transition to D0, it could crash since the mappings still exist during the transition?
not 'crash' as you also noted later with all Fs on read and dropped writes.
Ack, "crash" is definitely a strong word, I just meant that the user-space program isn't expecting to see all Fs today. Since today any access during reset is faulted, however with this all apps may have to lookout for all Fs during a read. Could this change cause existing apps to crash?
The old code is immune to it because it removed user-mappings first.
Following the discussion from v1 regarding the ordering of vfio_pci_dma_buf_move() and the D0 transition.. while it makes sense to perform the DMABUF revocation/move after the hardware is in D0.. I'm not too confident about moving zap after D0 :/
probably add a comment to remind that ordering requirement for dma
+1. That'd be helpful.
I mean, sure, the user would just see all Fs on a read and writes will be dropped silently until we are in D0.. but the behaviour before this change was that the user access will fault and hang on the memory_lock instead which ensures that the user observes a consistent dev state..
I see this more consistent from another angle.
Old code only removes/blocks cpu access but not for device. DMAs are allowed to this device while it's transitioning between D0/D3.
New code at least make this part consistent - both cpu/p2p are allowed in the transition window.
Ideally a sane userspace shouldn't rely on the content read back when it has initiated a reset in parallel. So this behavior change sounds ok?
I agree on the CPU / P2P consistency part. However, my concern is for a shared reset scenario where a reset triggered by one process (I guess it was vfio_assign_device_set?) can affect multiple devices in a dev_set that are owned by different, unrelated processes.
In the old code, these peer processes are protected because their BAR mappings are zapped immediately. Their MMIO threads simply stall in a page fault until the reset is complete.
I agree for a single-reset scenario, sane user-space should never access regions during a self-triggered reset.
Am I missing something?
Thanks, Praan