Hi all,
There were various suggestions in the September 2025 thread "[TECH TOPIC] vfio, iommufd: Enabling user space drivers to vend more granular access to client processes" [0], and LPC discussions, around improving the situation for multi-process userspace driver designs. This RFC series implements some of these ideas.
Background: Multi-process USDs ==============================
The userspace driver scenario discussed in that thread involves a primary process driving a PCIe function through VFIO/iommufd, which manages the function-wide ownership/lifecycle. The function is designed to provide multiple distinct programming interfaces (for example, several independent MMIO register frames in one function), and the primary process delegates control of these interfaces to multiple independent client processes (which do the actual work). This scenario clearly relies on a HW design that provides appropriate isolation between the programming interfaces.
The two key needs are:
1. Mechanisms to safely delegate a subset of the device MMIO resources to a client process without over-sharing wider access (or influence over whole-device activities, such as reset).
2. Mechanisms to allow a client process to do its own iommufd management w.r.t. its address space, in a way that's isolated from DMA relating to other clients.
mmap() of VFIO DMABUFs ======================
First, this RFC addresses #1, implementing the proposals in [0] to add mmap() support to the existing VFIO DMABUF exporter.
This enables a userspace driver to define DMABUF ranges corresponding to sub-ranges of a BAR, and grant a given client (via a shared fd) the capability to access (only) those sub-ranges. The VFIO device fds would be kept private to the primary process. All the client can do with that fd is map (or iomap via iommufd) that specific subset of resources, and the impact of bugs/malice is contained.
(We'll follow up on #2 separately, as a related-but-distinct problem. PASIDs are one way to achieve per-client isolation of DMA; another could be sharing of a single IOVA space via 'constrained' iommufds.)
Revocation/reclaim ==================
That's useful as-is, but then the lifetime of access granted to a client needs to be managed well. For example, a protocol between the primary process and the client can indicate when the client is done, and when it's safe to reuse the resources elsewhere.
Resources could be released cooperatively, but it's much more robust to enable the driver to make the resources guaranteed-inaccessible when it chooses, so that it can re-assign them to other uses in future.
So, second, I've suggested a PoC/example mechanism for reclaiming ranges shared with clients: a new DMABUF ioctl, DMA_BUF_IOCTL_REVOKE, is routed to a DMABUF exporter callback. The VFIO DMABUF exporter's implementation permanently revokes a DMABUF (notifying importers, and zapping PTEs for any mappings). This makes the DMABUF defunct and any client (or third party the client has shared the buffer onto!) cannot be used to access the BAR ranges, whether via DMABUF import or mmap().
A primary driver process would do this operation when the client's tenure ends to reclaim "loaned-out" MMIO interfaces, at which point the interfaces could be safely re-used.
This ioctl is one of several possible approaches to achieve buffer revocation, but I wanted to demonstrate something here as it's an important part of the buffer lifecycle in this driver scenario. An alternative implementation could be some VFIO-specific operation to search for a DMABUF (by address?) and kill it, but if the server keeps hold of the DMABUF fd that's already a clean way to locate it later.
BAR mapping access attributes =============================
Third, inspired by Alex [Mastro] and Jason's comments in [0], and Mahmoud's work in [1] with the goal of controlling CPU access attributes for VFIO BAR mappings (e.g. WC) I noticed that once we can mmap() VFIO DMABUFs representing BAR sub-spans, it's straightforward to decorate them with access attributes for the VMA.
I've proposed reserving a field in struct vfio_device_feature_dma_buf's flags to specify an attribute for its ranges. Although that keeps the (UAPI) struct unchanged, it means all ranges in a DMABUF share the same attribute. I feel a single attribute-to-mmap() relation is logical/reasonable. An application can also create multiple DMABUFs to describe any BAR layout and mix of attributes.
Tests =====
I've included an [RFC ONLY] userspace test program which I am _not_ proposing to merge, but am sharing for context. It illustrates & tests various map/revoke cases, but doesn't use the existing VFIO selftests and relies on a (tweaked) QEMU EDU function. I'm working on integrating the scenarios into the existing VFIO selftests.
This code has been tested in mapping DMABUFs of single/multiple ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff > 0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem to work correctly. I've lightly tested WC mappings also (by observing resulting PTEs as having the correct attributes...).
(The first two commits are a couple of tiny bugfixes which I can send separately, should reviewers prefer.)
This series is based on v6.19 but I expect to rebase, at least onto Leon's recent work [2] ("vfio: Wait for dma-buf invalidation to complete").
What are people's thoughts? I'll respin to de-RFC and capture comments, if we think this approach is appropriate.
Thanks,
Matt
References:
[0]: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/ [1]: https://lore.kernel.org/all/20250804104012.87915-1-mngyadam@amazon.de/ [2]: https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566ad...
Matt Evans (7): vfio/pci: Ensure VFIO barmap is set up before creating a DMABUF vfio/pci: Clean up DMABUFs before disabling function vfio/pci: Support mmap() of a DMABUF dma-buf: uapi: Mechanism to revoke DMABUFs via ioctl() vfio/pci: Permanently revoke a DMABUF on request vfio/pci: Add mmap() attributes to DMABUF feature [RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test
drivers/dma-buf/dma-buf.c | 5 + drivers/vfio/pci/vfio_pci_core.c | 4 +- drivers/vfio/pci/vfio_pci_dmabuf.c | 300 ++++++- include/linux/dma-buf.h | 22 + include/uapi/linux/dma-buf.h | 1 + include/uapi/linux/vfio.h | 12 +- tools/testing/selftests/vfio/Makefile | 1 + .../vfio/standalone/vfio_dmabuf_mmap_test.c | 822 ++++++++++++++++++ 8 files changed, 1153 insertions(+), 14 deletions(-) create mode 100644 tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c