On Fri, Feb 27, 2026 at 01:52:15PM -0800, Alex Mastro wrote:
On Fri, Feb 27, 2026 at 03:48:07PM -0400, Jason Gunthorpe wrote:
I actually would like to go the other way and have VFIO always have a DMABUF under the VMA's it mmaps because that will make it easy to finish the type1 emulation which requires finding dmabufs for the VMAs.
This is a still better idea since it avoid duplicating the VMA flow into two parts..
I suppose this would also compose with your idea to use dma-buf for iommufd_compat support of VFIO_IOMMU_MAP_DMA of vfio device fd-backed mmap()s [1]? Instead of needing to materialize a new dma-buf, you could use the existing backing one?
Yeah, that too
I think it is a fairly easy progression:
1) mmap_prepare() allocates a new dmabuf file * and sticks it in desc->vm_file. Rework so all the vma_ops are using vm_file that is a dmabuf. The allocated dmabuf has a singleton range 2) Teach the fault handlers to support full range semantics 3) Use dmabuf revoke variables/etc in the mmap fault handlers 4) Move the address space from the vfio to the dmabuf 5) Allow mmaping the dmabuf fd directly which is now only a couple lines
I forget how all the different mmap implementations in vfio interact though - but I think the above is good for vfio-pci
Jason
On 2/27/26 23:04, Jason Gunthorpe wrote:
On Fri, Feb 27, 2026 at 01:52:15PM -0800, Alex Mastro wrote:
On Fri, Feb 27, 2026 at 03:48:07PM -0400, Jason Gunthorpe wrote:
I actually would like to go the other way and have VFIO always have a DMABUF under the VMA's it mmaps because that will make it easy to finish the type1 emulation which requires finding dmabufs for the VMAs.
This is a still better idea since it avoid duplicating the VMA flow into two parts..
I suppose this would also compose with your idea to use dma-buf for iommufd_compat support of VFIO_IOMMU_MAP_DMA of vfio device fd-backed mmap()s [1]? Instead of needing to materialize a new dma-buf, you could use the existing backing one?
Yeah, that too
I think it is a fairly easy progression:
- mmap_prepare() allocates a new dmabuf file * and sticks it in desc->vm_file. Rework so all the vma_ops are using vm_file that is a dmabuf. The allocated dmabuf has a singleton range
Interesting approach to fix this, but I would suggest something even simpler:
Use the same structure as base class for the VFIO and DMA-buf file for your vma->vm_file->private_data object.
The DMA-buf file actually contains the real ranges exposed by it and pointing to the exporting VFIO, while the one for the VFIO is just a dummy covering the whole range and pointing to itself.
This way you should be able to use the same vm_operations_struct for VMAs mapped through both DMA-buf and the VFIO file descriptors.
Independent of how you implement this just one additional warning: huge_fault has caused a number of really hard to debug problems on x86.
As far as I know background is that on x86 pte_special() only works on true leave pte but not pmd/pud.
That in turn results in some nasty surprises when your PFNs are potentially backed by struct pages, e.g. for direct I/O. For example on the resulting mmap() get_user_pages_fast() works, but get_user_pages() doesn't.
I hope that those problems aren't applicable here, but if it is Thomas from the Intel XE team can give you more details on that stuff.
Regards, Christian.
- Teach the fault handlers to support full range semantics
- Use dmabuf revoke variables/etc in the mmap fault handlers
- Move the address space from the vfio to the dmabuf
- Allow mmaping the dmabuf fd directly which is now only a couple lines
I forget how all the different mmap implementations in vfio interact though - but I think the above is good for vfio-pci
Jason
On Mon, Mar 02, 2026 at 11:07:41AM +0100, Christian König wrote:
As far as I know background is that on x86 pte_special() only works on true leave pte but not pmd/pud.
This is not the case, there are pmd and pud_special as well, protected by CONFIG_xx
The arch should not define CONFIG_ARCH_SUPPORTS_PMD_PFNMAP if vmf_insert_pfn_pmd() doesn't result in pmd_special() working, for example.
eg:
vmf_insert_pfn_pmd() insert_pmd()
if (fop.is_folio) { // Not Taken } else { entry = pmd_mkhuge(pfn_pmd(fop.pfn, prot)); entry = pmd_mkspecial(entry);
This stuff was all put together by Peter specifically for VFIO to use, AFAIK it is correct.
IDK what Thomas was using, but if you tried to do huge faults before all of this was built it definitely would not work right as it only supported a folio backed path.
Jason
On 3/2/26 13:54, Jason Gunthorpe wrote:
On Mon, Mar 02, 2026 at 11:07:41AM +0100, Christian König wrote:
As far as I know background is that on x86 pte_special() only works on true leave pte but not pmd/pud.
This is not the case, there are pmd and pud_special as well, protected by CONFIG_xx
The arch should not define CONFIG_ARCH_SUPPORTS_PMD_PFNMAP if vmf_insert_pfn_pmd() doesn't result in pmd_special() working, for example.
eg:
vmf_insert_pfn_pmd() insert_pmd()
if (fop.is_folio) { // Not Taken } else { entry = pmd_mkhuge(pfn_pmd(fop.pfn, prot)); entry = pmd_mkspecial(entry);
This stuff was all put together by Peter specifically for VFIO to use, AFAIK it is correct.
Oh that is really nice to know, thanks for that information. It means we could give that approach another try.
IDK what Thomas was using, but if you tried to do huge faults before all of this was built it definitely would not work right as it only supported a folio backed path.
Yeah Thomas tried that like ~6years ago and my educated guess is that the whole infrastructure was just not there at that time.
Christian.
Jason
linaro-mm-sig@lists.linaro.org