[Linaro-mm-sig] Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release

18 Mar 2026

Hi Boris,
On 3/18/26 10:18, Boris Brezillon wrote:
...
Hi Christian,
On Wed, 18 Mar 2026 09:21:34 +0100
Christian König christian.koenig@amd.com wrote:
...
On 3/17/26 16:21, Boris Brezillon wrote:
...
On Tue, 17 Mar 2026 15:48:25 +0100
"Christian König" ckoenig.leichtzumerken@gmail.com wrote:
...
In case of a refcounting bug dma_fence_release() can be called
before the fence was even signaled.
Previously the dma_fence framework then force signaled the fence
to make sure to unblock waiters, but that can potentially lead to
random memory corruption when the DMA operation continues. So be
more defensive here and pick the lesser evil.
Instead of force signaling the fence set an error code on the
fence, re-initialize the refcount to something large and taint the
kernel.
This will leak memory and eventually can cause a deadlock when the
fence is never signaled, but at least we won't run into an use
after free or random memory corruption.
Signed-off-by: Christian König christian.koenig@amd.com
drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c
b/drivers/dma-buf/dma-fence.c index 1826ba73094c..8bf07685a053
100644 --- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
   	/*
   	 * Failed to signal before release, likely a
refcounting issue. *

 * This should never happen, but if it does make



sure that we

 * don't leave chains dangling. We set the error



flag first

 * so that the callbacks know this signal is due



to an error.

 * This should never happen, but if try to be



defensive and take

 * the lesser evil. Initialize the refcount to



something large,

 * but not so large that it can overflow.


 *


 * That will leak memory and could deadlock if



the fence never

 * signals, but at least it doesn't cause an use



after free or

 * random memory corruption.


 *


 * Also taint the kernel to note that it is



rather unreliable to

 * continue.

*/
dma_fence_lock_irqsave(fence, flags);
fence->error = -EDEADLK;


dma_fence_signal_locked(fence);




refcount_set(&fence->refcount.refcount, INT_MAX);



I'm not convinced this is useful. If we leak the object, no one
should have a ref to release anyway. This does raise a question
though. The case we're trying to protect against is fence_callback
being registered to this fence and waiting for an event to signal
another proxy fence.
Not quite. The real problematic case is that it is necessary to wait
for a fence to signal with tons of memory management locks held.
So it can be that a simple memory allocation cycles back and depends
on the fence to signal.
...
How can the refcnt drop to zero in that case? Isn't the proxy
supposed to own a ref on the fence. Before we go further, I'd like
to understand what we're trying to do.
Well we are in C here, so its simply coding errors. An unecessary
dma_fence_put() in an error path is enough to trigger this.
...
The original discussion that led you to write this patch was about
detecting when a fence emitter/producer would leave unsignalled
fences behind, and the problem we have is when such unsignalled
fences have observers waiting for a "signalled" event. If the
refcnt drops to zero and the fence is released, we're already
passed that point, unfortunately.
Well that is not quite correct.
The most common problem is that we have unbalanced
dma_fence_get()/dma_fence_put() and we end up in dma_fence_release()
before the issuer of the dma_fence has a chance to signal it.
Okay, so that's clearly not solving the problem we were discussing on
[1], I thought it was related.
Yeah, correct. The situation on the Rust side is clearly different, you simply doesn't have incorrect refcounting issues there.
...
Also, I'm still skeptical that we should
try and harden security for a situation that's already covered by
refcount overflow detection.
Refcount overflow detection is unfortunately not enabled everywhere and even if it is enabled it doesn't protect against such issues here, it only points them out when it is already to late.
...
I get why you want to do that, but it
feels like the wrong tool to me. I mean, we wouldn't even see it as
an unbalanced dma_fence_get/put() now that you manually set the refcount
to INT_MAX, which is the bug you're trying to cover for in the first
place.
...
See the main purpose of DMA fences is to prevent releasing memory
back into the core memory management before the DMA operation is
completed.
That's a UAF, just a differnt kind (device UAF instead of CPU UAF).
Yeah agree completely.
The problem is that SW UAF issues are preventable by using something like Rust while HW UAF issues can only be found by an IOMMU and that in turn is disabled more often than not.
Especially GPUs and accelerators usually use pass through mode for IOMMU because of both HW bugs as well as performance overhead.
...
Anyway, my point remains, the root of the issue you're covering for is
a dma_fence UAF (more put()s than get()s, and the CPU still has a ref
on a released dma_fence object). The outcome of this might be device
UAF because of the auto-signalling, but that's still just another
symptom of the dma_fence UAF (with wider consequences, admittedly).
...
So when a DMA fence signals to early it means that the HW is still
writing to that memory but we already potentially re-using the memory
ending in random memory corruption.
Yep, I'm well aware of that.
...
UAF issues are harmless compared to that.
That's not what I'm arguing against. What I'm saying is that you just
paper over an issue by messing up with the refcount, and now it's hard
to tell what the root cause is.
Completely agree as well. It's not a real solution, but only the lesser evil.
Regards,
Christian.
...
Regards,
Boris
[1]https://yhbt.net/lore/all/8bac1559-e139-4a74-a6e8-c2846093db72@amd.com/

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[Linaro-mm-sig] Re: [PATCH] dma-buf/dma_fence: be more defensive in dma_fence_release

Signed-off-by: Christian König christian.koenig@amd.com