Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release - Linaro-mm-sig

18 Mar 2026

On 3/17/26 16:21, Boris Brezillon wrote:
...
On Tue, 17 Mar 2026 15:48:25 +0100
"Christian König" ckoenig.leichtzumerken@gmail.com wrote:
...
In case of a refcounting bug dma_fence_release() can be called before the
fence was even signaled.
Previously the dma_fence framework then force signaled the fence to make
sure to unblock waiters, but that can potentially lead to random memory
corruption when the DMA operation continues. So be more defensive here and
pick the lesser evil.
Instead of force signaling the fence set an error code on the fence,
re-initialize the refcount to something large and taint the kernel.
This will leak memory and eventually can cause a deadlock when the fence
is never signaled, but at least we won't run into an use after free or
random memory corruption.
Signed-off-by: Christian König christian.koenig@amd.com
drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 1826ba73094c..8bf07685a053 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
   	/*
   	 * Failed to signal before release, likely a refcounting issue.
   	 *

 * This should never happen, but if it does make sure that we


 * don't leave chains dangling. We set the error flag first


 * so that the callbacks know this signal is due to an error.




 * This should never happen, but if try to be defensive and take


 * the lesser evil. Initialize the refcount to something large,


 * but not so large that it can overflow.


 *


 * That will leak memory and could deadlock if the fence never


 * signals, but at least it doesn't cause an use after free or


 * random memory corruption.


 *


 * Also taint the kernel to note that it is rather unreliable to


 * continue.

*/
dma_fence_lock_irqsave(fence, flags);
fence->error = -EDEADLK;


dma_fence_signal_locked(fence);




refcount_set(&fence->refcount.refcount, INT_MAX);



I'm not convinced this is useful. If we leak the object, no one should
have a ref to release anyway. This does raise a question though. The
case we're trying to protect against is fence_callback being registered
to this fence and waiting for an event to signal another proxy fence.
Not quite. The real problematic case is that it is necessary to wait for a fence to signal with tons of memory management locks held.
So it can be that a simple memory allocation cycles back and depends on the fence to signal.
...
How can the refcnt drop to zero in that case? Isn't the proxy supposed
to own a ref on the fence. Before we go further, I'd like to understand
what we're trying to do.
Well we are in C here, so its simply coding errors. An unecessary dma_fence_put() in an error path is enough to trigger this.
...
The original discussion that led you to write this patch was about
detecting when a fence emitter/producer would leave unsignalled fences
behind, and the problem we have is when such unsignalled fences have
observers waiting for a "signalled" event. If the refcnt drops to zero
and the fence is released, we're already passed that point,
unfortunately.
Well that is not quite correct.
The most common problem is that we have unbalanced dma_fence_get()/dma_fence_put() and we end up in dma_fence_release() before the issuer of the dma_fence has a chance to signal it.
See the main purpose of DMA fences is to prevent releasing memory back into the core memory management before the DMA operation is completed.
So when a DMA fence signals to early it means that the HW is still writing to that memory but we already potentially re-using the memory ending in random memory corruption.
UAF issues are harmless compared to that.
Regards,
Christian.
...
It can be that:

the fence was never exposed -> this is fine
the fence was exposed but never observed -> this is broken, because if
it had been observed it would have led to a deadlock
the fence was exposed, observed for some time, but the observer got
bored, stopped waiting and:
decided to go and execute its stuff anyway -> use-before-ready
situation
gave up -> kinda okay, but we should still consider the fence
emitter broken


the fence observer registered a callback but didn't take a ref on the
object -> this is potential UAF on the dma_fence, which can also lead
to a VRAM/system-mem UAF if the emitter drops the dma_fence without
signalling, because of the auto-signal you're getting rid of in this
patch.  But the latter is just a side effect of the dma_fence UAF,
which I'm not convinced we should try to protect against.

...
dma_fence_unlock_irqrestore(fence, flags);


rcu_read_unlock();


add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);


return;

}

ops = rcu_dereference(fence->ops);