On 16.03.2026 13:08, Maxime Ripard wrote:
On Wed, Mar 11, 2026 at 08:18:28AM -0500, Andrew Davis wrote:
On 3/11/26 5:19 AM, Albert Esteve wrote:
On Tue, Mar 10, 2026 at 4:34 PM Andrew Davis afd@ti.com wrote:
On 3/6/26 4:36 AM, Albert Esteve wrote:
Expose DT coherent reserved-memory pools ("shared-dma-pool" without "reusable") as dma-buf heaps, creating one heap per region so userspace can allocate from the exact device-local pool intended for coherent DMA.
This is a missing backend in the long-term effort to steer userspace buffer allocations (DRM, v4l2, dma-buf heaps) through heaps for clearer cgroup accounting. CMA and system heaps already exist; non-reusable coherent reserved memory did not.
The heap binds the heap device to each memory region so coherent allocations use the correct dev->dma_mem, and it defers registration until module_init when normal allocators are available.
Signed-off-by: Albert Esteve aesteve@redhat.com
drivers/dma-buf/heaps/Kconfig | 9 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/coherent_heap.c | 414 ++++++++++++++++++++++++++++++++++ 3 files changed, 424 insertions(+)(...)
You are doing this DMA allocation using a non-DMA pseudo-device (heap_dev). This is why you need to do that dma_coerce_mask_and_coherent(64) nonsense, you are doing a DMA alloc for the CPU itself. This might still work, but only if dma_map_sgtable() can handle swiotlb/iommu for all attaching devices at map time.
The concern is valid. We're allocating via a synthetic device, which ties the allocation to that device's DMA domain. I looked deeper into this trying to address the concern.
The approach works because dma_map_sgtable() handles both dma_map_direct and use_dma_iommu cases in __dma_map_sg_attrs(). For each physical address in the sg_table (extracted via sg_phys()), it creates device-specific DMA mappings:
- For direct mapping: it checks if the address is directly accessible
(dma_capable()), and if not, it falls back to swiotlb.
- For IOMMU: it creates mappings that allow the device to access
physical addresses.
This means every attached device gets its own device-specific DMA mapping, properly handling cases where the physical addresses are inaccessible or have DMA constraints.
While this means it might still "work" it won't always be ideal. Take the case where the consuming device(s) have a 32bit address restriction, if the allocation was done using the real devices then the backing buffer itself would be allocated in <32bit mem. Whereas here the allocation could end up in >32bit mem, as the CPU/synthetic device supports that. Then each mapping device would instead get a bounce buffer.
(this example might not be great as we usually know the address of carveout/reserved memory regions, but substitute in whatever restriction makes more sense)
These non-reusable carveouts tend to be made for some specific device, and they are made specifically because that device has some memory restriction. So we might run into the situation above more than one would expect.
Not a blocker here, but just something worth thinking on.
As I detailed in the previous version [1] the main idea behind that work is to allow to get rid of dma_alloc_attrs for framework and drivers to allocate from the heaps instead.
Robin was saying he wasn't comfortable with exposing this heap to userspace, and we're saying here that maybe this might not always work anyway (or at least that we couldn't test it fully).
Maybe the best thing is to defer this series until we are at a point where we can start enabling the "heap allocations" in frameworks then? Hopefully we will have hardware to test it with by then, and we might not even need to expose it to userspace at all but only to the kernel.
What do you think?
IMHO a good idea. Maybe in-kernel heap for the coherent allocations will be just enough.
Best regards