In case of a refcounting bug dma_fence_release() can be called before the fence was even signaled.
Previously the dma_fence framework then force signaled the fence to make sure to unblock waiters, but that can potentially lead to random memory corruption when the DMA operation continues. So be more defensive here and pick the lesser evil.
Instead of force signaling the fence set an error code on the fence, re-initialize the refcount to something large and taint the kernel.
This will leak memory and eventually can cause a deadlock when the fence is never signaled, but at least we won't run into an use after free or random memory corruption.
Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-fence.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..8bf07685a053 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref) /* * Failed to signal before release, likely a refcounting issue. * - * This should never happen, but if it does make sure that we - * don't leave chains dangling. We set the error flag first - * so that the callbacks know this signal is due to an error. + * This should never happen, but if try to be defensive and take + * the lesser evil. Initialize the refcount to something large, + * but not so large that it can overflow. + * + * That will leak memory and could deadlock if the fence never + * signals, but at least it doesn't cause an use after free or + * random memory corruption. + * + * Also taint the kernel to note that it is rather unreliable to + * continue. */ dma_fence_lock_irqsave(fence, flags); fence->error = -EDEADLK; - dma_fence_signal_locked(fence); + refcount_set(&fence->refcount.refcount, INT_MAX); dma_fence_unlock_irqrestore(fence, flags); + rcu_read_unlock(); + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); + return; }
ops = rcu_dereference(fence->ops);