[Linaro-mm-sig] Re: [PATCH v8 06/11] dma-buf: provide phys_vec to scatter-gather mapping routine

19 Nov 2025

On 11/19/25 14:42, Leon Romanovsky wrote:
...
On Wed, Nov 19, 2025 at 02:16:57PM +0100, Christian König wrote:
...
On 11/11/25 10:57, Leon Romanovsky wrote:
...
From: Leon Romanovsky leonro@nvidia.com
Add dma_buf_map() and dma_buf_unmap() helpers to convert an array of
MMIO physical address ranges into scatter-gather tables with proper
DMA mapping.
These common functions are a starting point and support any PCI
drivers creating mappings from their BAR's MMIO addresses. VFIO is one
case, as shortly will be RDMA. We can review existing DRM drivers to
refactor them separately. We hope this will evolve into routines to
help common DRM that include mixed CPU and MMIO mappings.
Compared to the dma_map_resource() abuse this implementation handles
the complicated PCI P2P scenarios properly, especially when an IOMMU
is enabled:

Direct bus address mapping without IOVA allocation for
PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This
happens if the IOMMU is enabled but the PCIe switch ACS flags allow
transactions to avoid the host bridge.
Further, this handles the slightly obscure, case of MMIO with a
phys_addr_t that is different from the physical BAR programming
(bus offset). The phys_addr_t is converted to a dma_addr_t and
accommodates this effect. This enables certain real systems to
work, especially on ARM platforms.

Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO
attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE).
This happens when the IOMMU is enabled and the ACS flags are forcing
all traffic to the IOMMU - ie for virtualization systems.

Cases where P2P is not supported through the host bridge/CPU. The
P2P subsystem is the proper place to detect this and block it.


Helper functions fill_sg_entry() and calc_sg_nents() handle the
scatter-gather table construction, splitting large regions into
UINT_MAX-sized chunks to fit within sg->length field limits.
Since the physical address based DMA API forbids use of the CPU list
of the scatterlist this will produce a mangled scatterlist that has
a fully zero-length and NULL'd CPU list. The list is 0 length,
all the struct page pointers are NULL and zero sized. This is stronger
and more robust than the existing mangle_sg_table() technique. It is
a future project to migrate DMABUF as a subsystem away from using
scatterlist for this data structure.
Tested-by: Alex Mastro amastro@fb.com
Tested-by: Nicolin Chen nicolinc@nvidia.com
Signed-off-by: Leon Romanovsky leonro@nvidia.com

drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   |  18 ++++
 2 files changed, 253 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 2bcf9ceca997..cb55dff1dad5 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1254,6 +1254,241 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
 
+static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,

			 dma_addr_t addr)



+{

unsigned int len, nents;
int i;

nents = DIV_ROUND_UP(length, UINT_MAX);
for (i = 0; i < nents; i++) {
len = min_t(size_t, length, UINT_MAX);


length -= len;


/*


 * DMABUF abuses scatterlist to create a scatterlist


 * that does not have any CPU list, only the DMA list.


 * Always set the page related values to NULL to ensure


 * importers can't use it. The phys_addr based DMA API


 * does not require the CPU list for mapping or unmapping.


 */


sg_set_page(sgl, NULL, 0, 0);


sg_dma_address(sgl) = addr + i * UINT_MAX;


sg_dma_len(sgl) = len;


sgl = sg_next(sgl);


}

return sgl;

+}



+static unsigned int calc_sg_nents(struct dma_iova_state *state,

		  struct dma_buf_phys_vec *phys_vec,


		  size_t nr_ranges, size_t size)



+{

unsigned int nents = 0;
size_t i;

if (!state || !dma_use_iova(state)) {
for (i = 0; i < nr_ranges; i++)


	nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);


} else {
/*


 * In IOVA case, there is only one SG entry which spans


 * for whole IOVA address space, but we need to make sure


 * that it fits sg->length, maybe we need more.


 */


nents = DIV_ROUND_UP(size, UINT_MAX);


}

return nents;

+}



+/**


struct dma_buf_dma - holds DMA mapping information



@sgt:    Scatter-gather table



@state:  DMA IOVA state relevant in IOMMU-based DMA



@size:   Total size of DMA transfer


*/

+struct dma_buf_dma {

struct sg_table sgt;
struct dma_iova_state *state;
size_t size;

+};



+/**


dma_buf_map - Returns the scatterlist table of the attachment from arrays



of physical vectors. This funciton is intended for MMIO memory only.



@attach:	[in]	attachment whose scatterlist is to be returned



@provider:	[in]	p2pdma provider



@phys_vec:	[in]	array of physical vectors



@nr_ranges:	[in]	number of entries in phys_vec array



@size:	[in]	total size of phys_vec



@dir:	[in]	direction of DMA transfer







Returns sg_table containing the scatterlist to be returned; returns ERR_PTR



on error. May return -EINTR if it is interrupted by a signal.







On success, the DMA addresses and lengths in the returned scatterlist are



PAGE_SIZE aligned.







A mapping must be unmapped by using dma_buf_unmap().


*/

+struct sg_table *dma_buf_map(struct dma_buf_attachment *attach,
That is clearly not a good name for this function. We already have overloaded the term *mapping* with something completely different.
This function performs DMA mapping, so what name do you suggest instead of dma_buf_map()?
Something like dma_buf_phys_vec_to_sg_table(). I'm not good at naming either.
...
...
...

	     struct p2pdma_provider *provider,


	     struct dma_buf_phys_vec *phys_vec,


	     size_t nr_ranges, size_t size,


	     enum dma_data_direction dir)



+{

unsigned int nents, mapped_len = 0;
struct dma_buf_dma *dma;
struct scatterlist *sgl;
dma_addr_t addr;
size_t i;
int ret;

dma_resv_assert_held(attach->dmabuf->resv);

if (WARN_ON(!attach || !attach->dmabuf || !provider))
/* This function is supposed to work on MMIO memory only */


return ERR_PTR(-EINVAL);



dma = kzalloc(sizeof(*dma), GFP_KERNEL);
if (!dma)
return ERR_PTR(-ENOMEM);



switch (pci_p2pdma_map_type(provider, attach->dev)) {
case PCI_P2PDMA_MAP_BUS_ADDR:
/*


 * There is no need in IOVA at all for this flow.


 */


break;


case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
dma->state = kzalloc(sizeof(*dma->state), GFP_KERNEL);


if (!dma->state) {


	ret = -ENOMEM;


	goto err_free_dma;


}



dma_iova_try_alloc(attach->dev, dma->state, 0, size);



Oh, that is a clear no-go for the core DMA-buf code.
It's intentionally up to the exporter how to create the DMA addresses the importer can work with.
I didn't fully understand the email either. The importer needs to
configure DMA and it supports only MMIO addresses. Exporter controls it
by asking for peer2peer.
I miss interpreted the call to pci_p2pdma_map_type() here in that now the DMA-buf code decides if transactions go over the root complex or not.
But the exporter can call pci_p2pdma_map_type() even before calling this function, so that looks fine to me.
Regards,
Christian.
...
Thanks

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[Linaro-mm-sig] Re: [PATCH v8 06/11] dma-buf: provide phys_vec to scatter-gather mapping routine