Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org --- drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return;
+ trace_dma_fence_signaled(fence); + /* * When neither a release nor a wait operation is specified set the ops * pointer to NULL to allow the fence structure to become independent @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags); - trace_dma_fence_signaled(fence);
list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
IMO the title should state that this fixes a NULL ptr deref, since that's very significant.
On Mon, 2026-03-30 at 14:36 +0100, Tvrtko Ursulin wrote:
Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
Can't fully follow; you're talking about the fence ops detachment for signaled fences?
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return;
- trace_dma_fence_signaled(fence);
If it's about touching the ops, the decisive action would be the IS_SIGNALED flag, wouldn't it? So trace_dma_fence_signaleld() should be above the flag test?
P.
/* * When neither a release nor a wait operation is specified set the ops * pointer to NULL to allow the fence structure to become independent @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
- trace_dma_fence_signaled(fence);
list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
On Mon, 30 Mar 2026 14:36:23 +0100 Tvrtko Ursulin tvrtko.ursulin@igalia.com wrote:
Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
I suspect other trace points are impacted too (trace_dma_fence_destroy() is, at the very least).
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return;
- trace_dma_fence_signaled(fence);
- /*
- When neither a release nor a wait operation is specified set the ops
- pointer to NULL to allow the fence structure to become independent
@@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
- trace_dma_fence_signaled(fence);
list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
On 31/03/2026 08:49, Boris Brezillon wrote:
On Mon, 30 Mar 2026 14:36:23 +0100 Tvrtko Ursulin tvrtko.ursulin@igalia.com wrote:
Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
I suspect other trace points are impacted too (trace_dma_fence_destroy() is, at the very least).
Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
Anyone has a better idea?
Regards,
Tvrtko
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return;
- trace_dma_fence_signaled(fence);
- /*
- When neither a release nor a wait operation is specified set the ops
- pointer to NULL to allow the fence structure to become independent
@@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
- trace_dma_fence_signaled(fence);
list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
On 4/9/26 15:58, Tvrtko Ursulin wrote:
On 31/03/2026 08:49, Boris Brezillon wrote:
On Mon, 30 Mar 2026 14:36:23 +0100 Tvrtko Ursulin tvrtko.ursulin@igalia.com wrote:
Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
I suspect other trace points are impacted too (trace_dma_fence_destroy() is, at the very least).
Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.
To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
Anyone has a better idea?
The idea of tracing without accessing fence->ops sounds valid to me.
Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.
Regards, Christian.
Regards,
Tvrtko
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return; + trace_dma_fence_signaled(fence);
/* * When neither a release nor a wait operation is specified set the ops * pointer to NULL to allow the fence structure to become independent @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags); - trace_dma_fence_signaled(fence); list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
On 10/04/2026 09:58, Christian König wrote:
On 4/9/26 15:58, Tvrtko Ursulin wrote:
On 31/03/2026 08:49, Boris Brezillon wrote:
On Mon, 30 Mar 2026 14:36:23 +0100 Tvrtko Ursulin tvrtko.ursulin@igalia.com wrote:
Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
I suspect other trace points are impacted too (trace_dma_fence_destroy() is, at the very least).
Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.
Right, but I was pretty sure I was enabling perf record -e 'dma_fence:*' when I hit this. Anyway, it doesn't matter, I could be misremembering.
To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
Anyone has a better idea?
The idea of tracing without accessing fence->ops sounds valid to me.
Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.
We could possibly use the helpers. I am not sure if RCU annotation would have to be casted away to keep sparse happy, but more importantly, I think it would not be safe.
thread A thread B
dma_fence_signal_timestamp_locked dma_fence_timeline_name .. ops = rcu_dereference(fence->ops); if (!dma_fence_test_signaled_flag(fence)) test_and_set_bit .. RCU_INIT_POINTER(fence->ops, NULL); return (const char __rcu *)ops->get_driver_name(fence); // OOPS!
Apologies for long line length, it did not fit otherwise.
Looks like we missed this. Or it is me who is missing something? Regards,
Tvrtko
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return; + trace_dma_fence_signaled(fence);
/* * When neither a release nor a wait operation is specified set the ops * pointer to NULL to allow the fence structure to become independent @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags); - trace_dma_fence_signaled(fence); list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
On 4/10/26 17:37, Tvrtko Ursulin wrote:
On 10/04/2026 09:58, Christian König wrote:
On 4/9/26 15:58, Tvrtko Ursulin wrote:
On 31/03/2026 08:49, Boris Brezillon wrote:
On Mon, 30 Mar 2026 14:36:23 +0100 Tvrtko Ursulin tvrtko.ursulin@igalia.com wrote:
Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
I suspect other trace points are impacted too (trace_dma_fence_destroy() is, at the very least).
Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.
Right, but I was pretty sure I was enabling perf record -e 'dma_fence:*' when I hit this. Anyway, it doesn't matter, I could be misremembering.
To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
Anyone has a better idea?
The idea of tracing without accessing fence->ops sounds valid to me.
Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.
We could possibly use the helpers. I am not sure if RCU annotation would have to be casted away to keep sparse happy, but more importantly, I think it would not be safe.
thread A thread B
dma_fence_signal_timestamp_locked dma_fence_timeline_name .. ops = rcu_dereference(fence->ops); if (!dma_fence_test_signaled_flag(fence)) test_and_set_bit .. RCU_INIT_POINTER(fence->ops, NULL); return (const char __rcu *)ops->get_driver_name(fence); // OOPS!
Apologies for long line length, it did not fit otherwise.
Looks like we missed this. Or it is me who is missing something?
See function dma_fence_driver_name() and dma_fence_timeline_name():
ops = rcu_dereference(fence->ops); if (!dma_fence_test_signaled_flag(fence)) return (const char __rcu *)ops->get_driver_name(fence);
We first grab the ops pointer and then check if the fence is signaled or not. Since we first set the signaled flag and then NULL the ops pointer in the other thread we should be save here.
Could only be that test_bit() is not a memory barrier, but set_bit() is so that would be a bit surprising.
Alternatively I would be fine to switching testing ops for NULL instead of calling dma_fence_test_signaled_flag().
Regards, Christian.
Regards,
Tvrtko
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return; + trace_dma_fence_signaled(fence);
/* * When neither a release nor a wait operation is specified set the ops * pointer to NULL to allow the fence structure to become independent @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags); - trace_dma_fence_signaled(fence); list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
On 13/04/2026 07:52, Christian König wrote:
On 4/10/26 17:37, Tvrtko Ursulin wrote:
On 10/04/2026 09:58, Christian König wrote:
On 4/9/26 15:58, Tvrtko Ursulin wrote:
On 31/03/2026 08:49, Boris Brezillon wrote:
On Mon, 30 Mar 2026 14:36:23 +0100 Tvrtko Ursulin tvrtko.ursulin@igalia.com wrote:
Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
I suspect other trace points are impacted too (trace_dma_fence_destroy() is, at the very least).
Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.
Right, but I was pretty sure I was enabling perf record -e 'dma_fence:*' when I hit this. Anyway, it doesn't matter, I could be misremembering.
To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
Anyone has a better idea?
The idea of tracing without accessing fence->ops sounds valid to me.
Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.
We could possibly use the helpers. I am not sure if RCU annotation would have to be casted away to keep sparse happy, but more importantly, I think it would not be safe.
thread A thread B
dma_fence_signal_timestamp_locked dma_fence_timeline_name .. ops = rcu_dereference(fence->ops); if (!dma_fence_test_signaled_flag(fence)) test_and_set_bit .. RCU_INIT_POINTER(fence->ops, NULL); return (const char __rcu *)ops->get_driver_name(fence); // OOPS!
Apologies for long line length, it did not fit otherwise.
Looks like we missed this. Or it is me who is missing something?
See function dma_fence_driver_name() and dma_fence_timeline_name():
ops = rcu_dereference(fence->ops); if (!dma_fence_test_signaled_flag(fence)) return (const char __rcu *)ops->get_driver_name(fence);
We first grab the ops pointer and then check if the fence is signaled or not. Since we first set the signaled flag and then NULL the ops pointer in the other thread we should be save here.
Could only be that test_bit() is not a memory barrier, but set_bit() is so that would be a bit surprising.
You are right, I got confused jumping back and forth between patches last week.
So changing to helpers on top of the change from this patch should be enough.
I only need to check if the __rcu needs to be casted away in the tracepoints assignments in case it can upset sparse. I will send a v2 when I do that.
Regards,
Tvrtko
Alternatively I would be fine to switching testing ops for NULL instead of calling dma_fence_test_signaled_flag().
Regards, Christian.
Regards,
Tvrtko
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return; + trace_dma_fence_signaled(fence);
/* * When neither a release nor a wait operation is specified set the ops * pointer to NULL to allow the fence structure to become independent @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags); - trace_dma_fence_signaled(fence); list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
On 3/30/26 15:36, Tvrtko Ursulin wrote:
Move the signalling tracepoint to before fence->ops are reset otherwise tracepoint will dereference a null pointer.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3") Cc: Christian König christian.koenig@amd.com Cc: Philipp Stanner phasta@kernel.org Cc: Boris Brezillon boris.brezillon@collabora.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org
Good catch. I just silently assumed that the fence is not signaled when we traced signaling.
Going to take another look since there might be more problems, but for now:
Reviewed-by: Christian König christian.koenig@amd.com
drivers/dma-buf/dma-fence.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1826ba73094c..1c1eaecaf1b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, &fence->flags))) return;
- trace_dma_fence_signaled(fence);
- /*
- When neither a release nor a wait operation is specified set the ops
- pointer to NULL to allow the fence structure to become independent
@@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence, fence->timestamp = timestamp; set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
- trace_dma_fence_signaled(fence);
list_for_each_entry_safe(cur, tmp, &cb_list, node) { INIT_LIST_HEAD(&cur->node);
linaro-mm-sig@lists.linaro.org