Hi Ketil,
On Tue, 5 May 2026 16:05:07 +0200 Ketil Johnsen ketil.johnsen@arm.com wrote:
From: John Stultz jstultz@google.com
Add proper reference counting on the dma_heap structure. While existing heaps are built-in, we may eventually have heaps loaded from modules, and we'll need to be able to properly handle the references to the heaps
It's weird that this "heap as module" thing is mentioned here, but actual robustness to make this safe is not added in the commit or any of the following ones.
Signed-off-by: John Stultz jstultz@google.com Signed-off-by: T.J. Mercier tjmercier@google.com Signed-off-by: Yong Wu yong.wu@mediatek.com [Yong: Just add comment for "minor" and "refcount"] Signed-off-by: Yunfei Dong yunfei.dong@mediatek.com [Yunfei: Change reviewer's comments] Signed-off-by: Florent Tomasin florent.tomasin@arm.com [Florent: Rebase] Signed-off-by: Ketil Johnsen ketil.johnsen@arm.com [Ketil: Rebase]
drivers/dma-buf/dma-heap.c | 29 +++++++++++++++++++++++++++++ include/linux/dma-heap.h | 2 ++ 2 files changed, 31 insertions(+)
diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c index ac5f8685a6494..9fd365ddbd517 100644 --- a/drivers/dma-buf/dma-heap.c +++ b/drivers/dma-buf/dma-heap.c @@ -12,6 +12,7 @@ #include <linux/dma-heap.h> #include <linux/err.h> #include <linux/export.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/nospec.h> #include <linux/syscalls.h> @@ -31,6 +32,7 @@
- @heap_devt: heap device node
- @list: list head connecting to list of heaps
- @heap_cdev: heap char device
*/
- @refcount: reference counter for this heap device
- Represents a heap of memory from which buffers can be made.
@@ -41,6 +43,7 @@ struct dma_heap { dev_t heap_devt; struct list_head list; struct cdev heap_cdev;
- struct kref refcount;
}; static LIST_HEAD(heap_list); @@ -248,6 +251,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) if (!heap) return ERR_PTR(-ENOMEM);
- kref_init(&heap->refcount); heap->name = exp_info->name; heap->ops = exp_info->ops; heap->priv = exp_info->priv;
@@ -313,6 +317,31 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) } EXPORT_SYMBOL_NS_GPL(dma_heap_add, "DMA_BUF_HEAP"); +static void dma_heap_release(struct kref *ref) +{
- struct dma_heap *heap = container_of(ref, struct dma_heap, refcount);
- unsigned int minor = MINOR(heap->heap_devt);
- mutex_lock(&heap_list_lock);
- list_del(&heap->list);
- mutex_unlock(&heap_list_lock);
- device_destroy(dma_heap_class, heap->heap_devt);
- cdev_del(&heap->heap_cdev);
- xa_erase(&dma_heap_minors, minor);
- kfree(heap);
That's actually problematic, because cdev_del() doesn't guarantee that all opened FDs have been closed [1], it just guarantees that no new ones can materialize. In order to make that safe, we'd need a
1. kref_get_unless_zero() in dma_heap_open(), with proper locking around the xa_load() to protect against the heap removal that's happening here 2. a dma_heap_put() in a new dma_heap_close() implementation 3. a guarantee that heap implementations won't go away until the last ref is dropped, which means ops and all the data needed for this heap to satisfy ioctl()s (and more generally every passed at dma_heap_add() time) have to stay valid until the last ref is dropped. Alternatively, we could restrict this only to in-flight ioctl()s, and have the ops replaced by some dummy ops using RCU or a rwlock. But I guess live dmabufs allocated on this heap have to retain the heap and its implementation anyway.
For record, #3 is already not satisfied by the current tee_heap implementation (tee_dma_heap objects can vanish before the dma_heap object is gone). The other implementations seem to be fine because they are statically linked, and they either have exp_info.priv set to NULL, or something that's never released.
TLDR; the whole assumption that adding refcounting to dma_heap is enough to guarantee safety around device/module removal is not holding, and adding in-kernel users acquiring dma_heap refs on top of this design is just going to make it even more painful to fix.
I see two way forward from here, either we get the dma_heap/dma_heap-producer lifetime right from the start the way I suggested above (I might have missed corner cases there BTW), or we keep assuming that heaps can only ever be created, never destroyed/removed (which is basically what the current dma_heap.c logic does, except tee_heap.c broke that), and just let dma_heap_find() return dma_heap pointers whose lifetime is assumed to be static.
+}
+/**
- dma_heap_put - drops a reference to a dmabuf heap, potentially freeing it
- @heap: DMA-Heap whose reference count to decrement
- */
+void dma_heap_put(struct dma_heap *heap) +{
- kref_put(&heap->refcount, dma_heap_release);
nit: I'd go
if (heap) kref_put(&heap->refcount, dma_heap_release);
so users can call dma_heap_put() on NULL heaps, which usually simplify error paths and/or destruction of partially initialized objects.
Regards,
Boris
[1]https://elixir.bootlin.com/linux/v7.0.1/source/fs/char_dev.c#L594