On 11/19/25 14:42, Leon Romanovsky wrote:
On Wed, Nov 19, 2025 at 02:16:57PM +0100, Christian König wrote:
On 11/11/25 10:57, Leon Romanovsky wrote:
From: Leon Romanovsky leonro@nvidia.com
Add dma_buf_map() and dma_buf_unmap() helpers to convert an array of MMIO physical address ranges into scatter-gather tables with proper DMA mapping.
These common functions are a starting point and support any PCI drivers creating mappings from their BAR's MMIO addresses. VFIO is one case, as shortly will be RDMA. We can review existing DRM drivers to refactor them separately. We hope this will evolve into routines to help common DRM that include mixed CPU and MMIO mappings.
Compared to the dma_map_resource() abuse this implementation handles the complicated PCI P2P scenarios properly, especially when an IOMMU is enabled:
Direct bus address mapping without IOVA allocation for PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This happens if the IOMMU is enabled but the PCIe switch ACS flags allow transactions to avoid the host bridge.
Further, this handles the slightly obscure, case of MMIO with a phys_addr_t that is different from the physical BAR programming (bus offset). The phys_addr_t is converted to a dma_addr_t and accommodates this effect. This enables certain real systems to work, especially on ARM platforms.
Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE). This happens when the IOMMU is enabled and the ACS flags are forcing all traffic to the IOMMU - ie for virtualization systems.
Cases where P2P is not supported through the host bridge/CPU. The P2P subsystem is the proper place to detect this and block it.
Helper functions fill_sg_entry() and calc_sg_nents() handle the scatter-gather table construction, splitting large regions into UINT_MAX-sized chunks to fit within sg->length field limits.
Since the physical address based DMA API forbids use of the CPU list of the scatterlist this will produce a mangled scatterlist that has a fully zero-length and NULL'd CPU list. The list is 0 length, all the struct page pointers are NULL and zero sized. This is stronger and more robust than the existing mangle_sg_table() technique. It is a future project to migrate DMABUF as a subsystem away from using scatterlist for this data structure.
Tested-by: Alex Mastro amastro@fb.com Tested-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com
drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 18 ++++ 2 files changed, 253 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 2bcf9ceca997..cb55dff1dad5 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -1254,6 +1254,241 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, } EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF"); +static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,
dma_addr_t addr)+{
- unsigned int len, nents;
- int i;
- nents = DIV_ROUND_UP(length, UINT_MAX);
- for (i = 0; i < nents; i++) {
len = min_t(size_t, length, UINT_MAX);length -= len;/** DMABUF abuses scatterlist to create a scatterlist* that does not have any CPU list, only the DMA list.* Always set the page related values to NULL to ensure* importers can't use it. The phys_addr based DMA API* does not require the CPU list for mapping or unmapping.*/sg_set_page(sgl, NULL, 0, 0);sg_dma_address(sgl) = addr + i * UINT_MAX;sg_dma_len(sgl) = len;sgl = sg_next(sgl);- }
- return sgl;
+}
+static unsigned int calc_sg_nents(struct dma_iova_state *state,
struct dma_buf_phys_vec *phys_vec,size_t nr_ranges, size_t size)+{
- unsigned int nents = 0;
- size_t i;
- if (!state || !dma_use_iova(state)) {
for (i = 0; i < nr_ranges; i++)nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);- } else {
/** In IOVA case, there is only one SG entry which spans* for whole IOVA address space, but we need to make sure* that it fits sg->length, maybe we need more.*/nents = DIV_ROUND_UP(size, UINT_MAX);- }
- return nents;
+}
+/**
- struct dma_buf_dma - holds DMA mapping information
- @sgt: Scatter-gather table
- @state: DMA IOVA state relevant in IOMMU-based DMA
- @size: Total size of DMA transfer
- */
+struct dma_buf_dma {
- struct sg_table sgt;
- struct dma_iova_state *state;
- size_t size;
+};
+/**
- dma_buf_map - Returns the scatterlist table of the attachment from arrays
- of physical vectors. This funciton is intended for MMIO memory only.
- @attach: [in] attachment whose scatterlist is to be returned
- @provider: [in] p2pdma provider
- @phys_vec: [in] array of physical vectors
- @nr_ranges: [in] number of entries in phys_vec array
- @size: [in] total size of phys_vec
- @dir: [in] direction of DMA transfer
- Returns sg_table containing the scatterlist to be returned; returns ERR_PTR
- on error. May return -EINTR if it is interrupted by a signal.
- On success, the DMA addresses and lengths in the returned scatterlist are
- PAGE_SIZE aligned.
- A mapping must be unmapped by using dma_buf_unmap().
- */
+struct sg_table *dma_buf_map(struct dma_buf_attachment *attach,
That is clearly not a good name for this function. We already have overloaded the term *mapping* with something completely different.
This function performs DMA mapping, so what name do you suggest instead of dma_buf_map()?
Something like dma_buf_phys_vec_to_sg_table(). I'm not good at naming either.
struct p2pdma_provider *provider,struct dma_buf_phys_vec *phys_vec,size_t nr_ranges, size_t size,enum dma_data_direction dir)+{
- unsigned int nents, mapped_len = 0;
- struct dma_buf_dma *dma;
- struct scatterlist *sgl;
- dma_addr_t addr;
- size_t i;
- int ret;
- dma_resv_assert_held(attach->dmabuf->resv);
- if (WARN_ON(!attach || !attach->dmabuf || !provider))
/* This function is supposed to work on MMIO memory only */return ERR_PTR(-EINVAL);- dma = kzalloc(sizeof(*dma), GFP_KERNEL);
- if (!dma)
return ERR_PTR(-ENOMEM);- switch (pci_p2pdma_map_type(provider, attach->dev)) {
- case PCI_P2PDMA_MAP_BUS_ADDR:
/** There is no need in IOVA at all for this flow.*/break;- case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
dma->state = kzalloc(sizeof(*dma->state), GFP_KERNEL);if (!dma->state) {ret = -ENOMEM;goto err_free_dma;}dma_iova_try_alloc(attach->dev, dma->state, 0, size);Oh, that is a clear no-go for the core DMA-buf code.
It's intentionally up to the exporter how to create the DMA addresses the importer can work with.
I didn't fully understand the email either. The importer needs to configure DMA and it supports only MMIO addresses. Exporter controls it by asking for peer2peer.
I miss interpreted the call to pci_p2pdma_map_type() here in that now the DMA-buf code decides if transactions go over the root complex or not.
But the exporter can call pci_p2pdma_map_type() even before calling this function, so that looks fine to me.
Regards, Christian.
Thanks