On Mon, Oct 20, 2025 at 09:58:54AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 20, 2025 at 05:27:02AM -0700, Christoph Hellwig wrote:
On Fri, Oct 17, 2025 at 08:53:20AM -0300, Jason Gunthorpe wrote:
On Thu, Oct 16, 2025 at 11:30:06PM -0700, Christoph Hellwig wrote:
On Mon, Oct 13, 2025 at 06:26:03PM +0300, Leon Romanovsky wrote:
The DMA API now has a new flow, and has gained phys_addr_t support, so it no longer needs struct pages to perform P2P mapping.
That's news to me. All the pci_p2pdma_map_state machinery is still based on pgmaps and thus pages.
We had this discussion already three months ago:
https://lore.kernel.org/all/20250729131502.GJ36037@nvidia.com/
These couple patches make the core pci_p2pdma_map_state machinery work on struct p2pdma_provider, and pgmap is just one way to get a p2pdma_provider *
The struct page paths through pgmap go page->pgmap->mem to get p2pdma_provider.
The non-struct page paths just have a p2pdma_provider * without a pgmap. In this series VFIO uses
- *provider = pcim_p2pdma_provider(pdev, bar);
To get the provider for a specific BAR.
And what protects that life time? I've not seen anyone actually building the proper lifetime management. And if someone did the patches need to clearly point to that.
It is this series!
The above API gives a lifetime that is driver bound. The calling driver must ensure it stops using provider and stops doing DMA with it before remove() completes.
This VFIO series does that through the move_notify callchain I showed in the previous email. This callchain is always triggered before remove() of the VFIO PCI driver is completed.
I think I've answered this three times now - for DMABUF the DMABUF invalidation scheme is used to control the lifetime and no DMA mapping outlives the provider, and the provider doesn't outlive the driver.
How?
I explained it in detail in the message you are repling to. If something is not clear can you please be more specific??
Is it the mmap in VFIO perhaps that is causing these questions?
VFIO uses a PFNMAP VMA, so you can't pin_user_page() it. It uses unmap_mapping_range() during its remove() path to get rid of the VMA PTEs.
The DMA activity doesn't use the mmap *at all*. It isn't like NVMe which relies on the ZONE_DEVICE pages and VMAs to link drivers togther.
Instead the DMABUF FD is used to pass the MMIO pages between VFIO and another driver. DMABUF has a built in invalidation mechanism that VFIO triggers before remove(). The invalidation removes access from the other driver.
This is different than NVMe which has no invalidation. NVMe does unmap_mapping_range() on the VMA and waits for all the short lived pgmap references to clear. We don't need anything like that because DMABUF invalidation is synchronous.
The full picture for VFIO is something like:
[startup] MMIO is acquired from the pci_resource p2p_providers are setup
[runtime] MMIO is mapped into PFNMAP VMAs MMIO is linked to a DMABUF FD DMABUF FD gets DMA mapped using the p2p_provider
[unplug] unmap_mapping_range() is called so all VMAs are emptied out and the fault handler prevents new PTEs ** No access to the MMIO through VMAs is possible**
vfio_pci_dma_buf_cleanup() is called which prevents new DMABUF mappings from starting, and does dma_buf_move_notify() on all the open DMABUF FDs to invalidate other drivers. Other drivers stop doing DMA and we need to free the IOVA from the IOMMU/etc. ** No DMA access from other drivers is possible now**
Any still open DMABUF FD will fail inside VFIO immediately due to the priv->revoked checks. **No code touches the p2p_provider anymore**
The p2p_provider is destroyed by devm.
Obviously you cannot use the new p2provider mechanism without some kind of protection against use after hot unplug, but it doesn't have to be struct page based.
And how does this interact with everyone else expecting pgmap based lifetime management.
They continue to use pgmap and nothing changes for them.
The pgmap path always waited until nothing was using the pgmap and thus provider before allowing device driver remove() to complete.
The refactoring doesn't change the lifecycle model, it just provides entry points to access the driver bound lifetime model directly instead of being forced to use pgmap.
Leon, can you add some remarks to the comments about what the rules are to call pcim_p2pdma_provider() ?
Yes, sure.
Thanks
Jason