Hi Jason + Christian,
On 27/02/2026 12:51, Jason Gunthorpe wrote:
On Fri, Feb 27, 2026 at 11:09:31AM +0100, Christian König wrote:
When a DMA-buf just represents a linear piece of BAR which is map-able through the VFIO FD anyway then the right approach is to just re-direct the mapping to this VFIO FD.
We think limiting this to one range per DMABUF isn't enough, i.e. supporting multiple ranges will be a benefit.
Bumping vm_pgoff to then reuse vfio_pci_mmap_ops is a really nice suggestion for the simplest case, but can't support multiple ranges; the .fault() needs to be aware of the non-linear DMABUF layout.
I actually would like to go the other way and have VFIO always have a DMABUF under the VMA's it mmaps because that will make it easy to finish the type1 emulation which requires finding dmabufs for the VMAs.
It can be that you want additional checks (e.g. if the DMA-buf is revoked) in which case you would need to override the vma->vm_ops, but then just do the access checks and call the vfio_pci_mmap_ops to get the actually page fault handling done.
It isn't that simple, the vm_ops won't have a way to get back to the dmabuf from the vma to find the per-fd revoke flag to check it.
Sounds like the suggestion is just to reuse vfio_pci_mmap_*fault(), i.e. install "interposer" vm_ops for some new 'fault_but_check_revoke()' to then call down to the existing vfio_pci_mmap_*fault(), after fishing the DMABUF out of vm_private_data. (Like the proposed vfio_pci_dma_buf_mmap_huge_fault() does.)
Putting aside the above point of needing a new .fault() able to find a PFN for >1 range for a mo, how would the test of the revoked flag work w.r.t. synchronisation and protecting against a racing revoke? It's not safe to take memory_lock, test revoked, unlock, then hand over to the existing vfio_pci_mmap_*fault() -- which re-takes the lock. I'm not quite seeing how we could reuse the existing vfio_pci_mmap_*fault(), TBH. I did briefly consider refactoring that existing .fault() code, but that makes both paths uglier.
To summarise, I think we still - need a new fops->mmap() to link vfio_pci_dma_buf into vm_private_data, and determine WC attrs - need a new vm_ops->fault() to test dmabuf->revoked/status and determine map vs fault with memory_lock held, and to determine the PFN from >1 DMABUF ranges
unmap_mapping_range(priv->dmabuf->file->f_mapping,0, priv->size, 1);When you need to use unmap_mapping_range() then you usually share the address space object between the file descriptor exporting the DMA-buf and the DMA-buf fd itself.
Yeah, this becomes problematic. Right now there is a single address space per vfio-device and the invalidation is global.
Possibly for this use case you can keep that and do a global unmap and rely on fault to restore the mmaps that were not revoked.
Hm, that'd be functional, but we should consider huge BARs with a lot of PTEs (even huge ones); zapping all BARs might noticeably disturb other clients. But see my query below please, if we could zap just the resource being reclaimed that would be preferable.
Otherwise functions like vfio_pci_zap_bars() doesn't work correctly any more and that usually creates a huge bunch of problems.
I'd reasoned it was OK for the DMABUF to have its own unique address space -- even though IIUC that means an unmap_mapping_range() by vfio_pci_core_device won't affect a DMABUF's mappings -- because anything that needs to zap a BAR _also_ must already plan to notify DMABUF importers via vfio_pci_dma_buf_move(). And then, vfio_pci_dma_buf_move() will zap the mappings.
Are there paths that _don't_ always pair vfio_pci_zap_bars() with a vfio_pci_dma_buf_move()?
I'm sure I'm missing something, so question phrased as a statement: The only way that mappings could be missed would be if some path forgets to ...buf_move() when zapping the BARs, but that'd be a problem for importers regardless of whether they can now also be mmap()ed, no?
I don't want to flout convention for the sake of it, and am keen to learn more, so please gently explain in more detail: Why must we associate the DMABUFs with the VFIO address space [by sharing the AS object between the VFIO fd exporting the DMABUF and the DMABUF fd]?
Many thanks,
Matt