On Fri, Jul 25, 2025 at 05:34:40AM +0000, Kasireddy, Vivek wrote:
Hi Leon,
Subject: Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions
From: Leon Romanovsky leonro@nvidia.com
Add support for exporting PCI device MMIO regions through dma-buf, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to
import
dma-buf FDs and build them into memory regions for PCI P2P
operations.
The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Vivek Kasireddy vivek.kasireddy@intel.com Signed-off-by: Leon Romanovsky leonro@nvidia.com
drivers/vfio/pci/Kconfig | 20 ++ drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/vfio_pci_config.c | 22 +- drivers/vfio/pci/vfio_pci_core.c | 25 ++- drivers/vfio/pci/vfio_pci_dmabuf.c | 321
+++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 +++ include/linux/dma-buf.h | 1 + include/linux/vfio_pci_core.h | 3 + include/uapi/linux/vfio.h | 19 ++ 9 files changed, 431 insertions(+), 5 deletions(-) create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
<...>
+static int validate_dmabuf_input(struct vfio_pci_core_device *vdev,
struct vfio_device_feature_dma_buf
*dma_buf)
+{
- struct pci_dev *pdev = vdev->pdev;
- u32 bar = dma_buf->region_index;
- u64 offset = dma_buf->offset;
- u64 len = dma_buf->length;
- resource_size_t bar_size;
- u64 sum;
- /*
* For PCI the region_index is the BAR number like everything else.
*/
- if (bar >= VFIO_PCI_ROM_REGION_INDEX)
return -ENODEV;
<...>
+/**
- Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
- regions selected.
- open_flags are the typical flags passed to open(2), eg O_RDWR,
O_CLOEXEC,
- etc. offset/length specify a slice of the region to create the dmabuf
from.
- nr_ranges is the total number of (P2P DMA) ranges that comprise the
dmabuf.
Any particular reason why you dropped the option (nr_ranges) of creating
a
single dmabuf from multiple ranges of an MMIO region?
I did it for two reasons. First, I wanted to simplify the code in order to speed-up discussion over the patchset itself. Second, I failed to find justification for need of multiple ranges, as the number of BARs are limited by VFIO_PCI_ROM_REGION_INDEX (6) and same functionality can be achieved by multiple calls to DMABUF import.
I don't think the same functionality can be achieved by multiple calls to dmabuf import. AFAIU, a dmabuf (as of today) is backed by a SGL that can have multiple entries because it represents a scattered buffer (multiple non-contiguous entries in System RAM or an MMIO region).
I don't know all the reasons why SG was chosen, but one of the main reasons is that DMA SG API was the only one possible way to handle p2p transfers (peer2peer flag).
But in this patch you are constraining it such that only one entry associated with a buffer would be included, which effectively means that we cannot create a dmabuf to represent scattered buffers (located in a single MMIO region such as VRAM or other device memory) anymore.
Yes
Restricting the dmabuf to a single range (or having to create multiple
dmabufs
to represent multiple regions/ranges associated with a single scattered
buffer)
would be very limiting and may not work in all cases. For instance, in my
use-case,
I am trying to share a large (4k mode) framebuffer (FB) located in GPU's
VRAM
between two (p2p compatible) GPU devices. And, this would probably not
work
given that allocating a large contiguous FB (nr_ranges = 1) in VRAM may
not be
feasible when there is memory pressure.
Can you please help me and point to the place in the code where this can fail? I'm probably missing something basic as there are no large allocations in the current patchset.
Sorry, I was not very clear. What I meant is that it is not prudent to assume that there will only be one range associated with an MMIO region which we need to consider while creating a dmabuf. And, I was pointing out my use-case as an example where vfio-pci needs to create a dmabuf for a large buffer (FB) that would likely be scattered (and not contiguous) in an MMIO region (such as VRAM).
Let me further explain with my use-case. Here is a link to my Qemu-based test: https://gitlab.freedesktop.org/Vivek/qemu/-/commit/b2bdb16d9cfaf55384c95b1f0...
Ohh, thanks. I'll add nr_ranges in next version. I see that you are using same region_index for all ranges and this is how I would like to keep it: "multiple nr_ranges, same region_index".
Thanks
linaro-mm-sig@lists.linaro.org