On 3/17/26 16:21, Boris Brezillon wrote:
On Tue, 17 Mar 2026 15:48:25 +0100 "Christian König" ckoenig.leichtzumerken@gmail.com wrote:
In case of a refcounting bug dma_fence_release() can be called before the fence was even signaled.
Previously the dma_fence framework then force signaled the fence to make sure to unblock waiters, but that can potentially lead to random memory corruption when the DMA operation continues. So be more defensive here and pick the lesser evil.
Instead of force signaling the fence set an error code on the fence, re-initialize the refcount to something large and taint the kernel.
This will leak memory and eventually can cause a deadlock when the fence is never signaled, but at least we won't run into an use after free or random memory corruption.
Signed-off-by: Christian König christian.koenig@amd.com
drivers/dma-buf/dma-fence.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..8bf07685a053 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref) /* * Failed to signal before release, likely a refcounting issue. *
* This should never happen, but if it does make sure that we* don't leave chains dangling. We set the error flag first* so that the callbacks know this signal is due to an error.
* This should never happen, but if try to be defensive and take* the lesser evil. Initialize the refcount to something large,* but not so large that it can overflow.** That will leak memory and could deadlock if the fence never* signals, but at least it doesn't cause an use after free or* random memory corruption.** Also taint the kernel to note that it is rather unreliable to */ dma_fence_lock_irqsave(fence, flags); fence->error = -EDEADLK;* continue.
dma_fence_signal_locked(fence);
refcount_set(&fence->refcount.refcount, INT_MAX);I'm not convinced this is useful. If we leak the object, no one should have a ref to release anyway. This does raise a question though. The case we're trying to protect against is fence_callback being registered to this fence and waiting for an event to signal another proxy fence.
Not quite. The real problematic case is that it is necessary to wait for a fence to signal with tons of memory management locks held.
So it can be that a simple memory allocation cycles back and depends on the fence to signal.
How can the refcnt drop to zero in that case? Isn't the proxy supposed to own a ref on the fence. Before we go further, I'd like to understand what we're trying to do.
Well we are in C here, so its simply coding errors. An unecessary dma_fence_put() in an error path is enough to trigger this.
The original discussion that led you to write this patch was about detecting when a fence emitter/producer would leave unsignalled fences behind, and the problem we have is when such unsignalled fences have observers waiting for a "signalled" event. If the refcnt drops to zero and the fence is released, we're already passed that point, unfortunately.
Well that is not quite correct.
The most common problem is that we have unbalanced dma_fence_get()/dma_fence_put() and we end up in dma_fence_release() before the issuer of the dma_fence has a chance to signal it.
See the main purpose of DMA fences is to prevent releasing memory back into the core memory management before the DMA operation is completed.
So when a DMA fence signals to early it means that the HW is still writing to that memory but we already potentially re-using the memory ending in random memory corruption.
UAF issues are harmless compared to that.
Regards, Christian.
It can be that:
- the fence was never exposed -> this is fine
- the fence was exposed but never observed -> this is broken, because if it had been observed it would have led to a deadlock
- the fence was exposed, observed for some time, but the observer got bored, stopped waiting and:
- decided to go and execute its stuff anyway -> use-before-ready situation
- gave up -> kinda okay, but we should still consider the fence emitter broken
- the fence observer registered a callback but didn't take a ref on the object -> this is potential UAF on the dma_fence, which can also lead to a VRAM/system-mem UAF if the emitter drops the dma_fence without signalling, because of the auto-signal you're getting rid of in this patch. But the latter is just a side effect of the dma_fence UAF, which I'm not convinced we should try to protect against.
dma_fence_unlock_irqrestore(fence, flags);
rcu_read_unlock();add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); }return;ops = rcu_dereference(fence->ops);
linaro-mm-sig@lists.linaro.org