Hi Alex,
On Wed, Mar 18, 2026 at 8:04 PM Alex Williamson alex@shazbot.org wrote:
On Thu, 12 Mar 2026 11:46:02 -0700 Matt Evans mattev@meta.com wrote:
This helper, vfio_pci_core_mmap_prep_dmabuf(), creates a single-range DMABUF for the purpose of mapping a PCI BAR. This is used in a future commit by VFIO's ordinary mmap() path.
This function transfers ownership of the VFIO device fd to the DMABUF, which fput()s when it's released.
Refactor the existing vfio_pci_core_feature_dma_buf() to split out export code common to the two paths, VFIO_DEVICE_FEATURE_DMA_BUF and this new VFIO_BAR mmap().
Signed-off-by: Matt Evans mattev@meta.com
drivers/vfio/pci/vfio_pci_dmabuf.c | 131 +++++++++++++++++++++-------- drivers/vfio/pci/vfio_pci_priv.h | 4 + 2 files changed, 102 insertions(+), 33 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index 63140528dbea..76db340ba592 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -82,6 +82,8 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) up_write(&priv->vdev->memory_lock); vfio_device_put_registration(&priv->vdev->vdev); }
if (priv->vfile)fput(priv->vfile); kfree(priv->phys_vec); kfree(priv);} @@ -182,6 +184,41 @@ int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *vpdmabuf, return -EFAULT; }
+static int vfio_pci_dmabuf_export(struct vfio_pci_core_device *vdev,
struct vfio_pci_dma_buf *priv, uint32_t flags,size_t size, bool status_ok)+{
DEFINE_DMA_BUF_EXPORT_INFO(exp_info);if (!vfio_device_try_get_registration(&vdev->vdev))return -ENODEV;exp_info.ops = &vfio_pci_dmabuf_ops;exp_info.size = size;exp_info.flags = flags;exp_info.priv = priv;priv->dmabuf = dma_buf_export(&exp_info);if (IS_ERR(priv->dmabuf)) {vfio_device_put_registration(&vdev->vdev);return PTR_ERR(priv->dmabuf);}kref_init(&priv->kref);init_completion(&priv->comp);/* dma_buf_put() now frees priv */INIT_LIST_HEAD(&priv->dmabufs_elm);down_write(&vdev->memory_lock);dma_resv_lock(priv->dmabuf->resv, NULL);priv->revoked = !status_ok;Testing __vfio_pci_memory_enabled() outside of memory_lock() is invalid, so passing it as a parameter outside of the semaphore is invalid. @status_ok is stale here.
So it is, arrrrrgh. Thank you for that; I've found a couple of other choice bugs in that RFC, and will resolve all of this in a repost soon.
[snip]
/** The VMA gets the DMABUF file so that other users can locate* the DMABUF via a VA. Ownership of the original VFIO device* file being mmap()ed transfers to priv, and is put when the* DMABUF is released.*/priv->vfile = vma->vm_file;vma->vm_file = priv->dmabuf->file;AIUI, this affects what the user sees in /proc/<pid>/maps, right? Previously a memory range could be clearly associated with a specific vfio device, now, only for vfio-pci devices, I think the range is associated to a nondescript dmabuf. If so, is that an acceptable, user visible, debugging friendly change (ex. lsof)? Thanks,
(Jason, your comment noted with thanks, replying to you both here to save electrons.)
Great question; a formatting change there is inherent to moving to a DMABUF (which generates a "/dmabuf:" prefix to a user-defined name). If we can accept that it changes at all, then I agree this then should output nice debug: at least the cdev name and resource index, and we've the opportunity to include the BDF too. I've added this; an example line of /proc/<pid>/maps:
ffffb8070000-ffffbc040000 rw-s 00030000 00:0b 5 /dmabuf:vfio0:0000:00:03.0/1
Note the file offset used to include the resource index up at VFIO_PCI_OFFSET_SHIFT but this DMABUF version doesn't do that, so I'm proposing appending a "/%u" for the index. Above is a map of BAR1, offset 0x30000. If people feel strongly about the existing aesthetic then we could keep the index encoded in vm_pgoff to retain the same offset field in /proc/<pid>/maps, but it'd be less neat masking it back out in a few places.
The default name of a DMABUF acquired through VFIO_DEVICE_FEATURE_DMA_BUF would still be "/dmabuf:" and I think it should stay this way since a better name should be supplied by userspace. The default at least differentiates them from VFIO device fd mappings.
Many thanks,
Matt
Alex
vma->vm_private_data = priv;return 0;+err_free_phys:
kfree(priv->phys_vec);+err_free_priv:
kfree(priv);return ret;+}
void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) { struct vfio_pci_dma_buf *priv; diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index 5cc8c85a2153..5fd3a6e00a0e 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -30,6 +30,7 @@ struct vfio_pci_dma_buf { size_t size; struct phys_vec *phys_vec; struct p2pdma_provider *provider;
struct file *vfile; u32 nr_ranges; struct kref kref; struct completion comp;@@ -128,6 +129,9 @@ int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *vpdmabuf, unsigned long address, unsigned int order, unsigned long *out_pfn); +int vfio_pci_core_mmap_prep_dmabuf(struct vfio_pci_core_device *vdev,
struct vm_area_struct *vma,u64 phys_start, u64 pgoff, u64 req_len);#ifdef CONFIG_VFIO_PCI_DMABUF int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,