Am 10.04.25 um 15:09 schrieb Philipp Stanner:
On Thu, 2025-04-10 at 14:58 +0200, Christian König wrote:
Am 10.04.25 um 11:24 schrieb Philipp Stanner:
Nouveau currently relies on the assumption that dma_fences will only ever get signaled through nouveau_fence_signal(), which takes care of removing a signaled fence from the list nouveau_fence_chan.pending.
This self-imposed rule is violated in nouveau_fence_done(), where dma_fence_is_signaled() (somewhat surprisingly, considering its name) can signal the fence without removing it from the list. This enables accesses to already signaled fences through the list, which is a bug.
In particular, it can race with nouveau_fence_context_kill(), which would then attempt to set an error code on an already signaled fence, which is illegal.
In nouveau_fence_done(), the call to nouveau_fence_update() already ensures to signal all ready fences. Thus, the signaling potentially performed by dma_fence_is_signaled() is actually not necessary.
Ah, I now got what you are trying to do here! But that won't help.
The problem is it is perfectly valid for somebody external (e.g. other driver, TTM etc...) to call dma_fence_is_signaled() on a nouveau fence.
This will then in turn still signal the fence and leave it on the pending list and creating the problem you have.
Good to hear – precisely that then is the use case for a dma_fence callback! ^_^ It guarantees that, no matter who signals a fence, no matter at what place, a certain action will always be performed.
I can't think of any other mechanism which could guarantee that a signaled fence immediately gets removed from nouveau's pending list, other than the callbacks.
But seriously, I don't think that anyone does this currently, nor do I think that anyone could get away with doing it without the entire computer burning down.
Yeah, I don't think that this is possible at the moment.
When you do stuff like that from the provider side you will always run into lifetime issues because in the signaling from interrupt case you then drop the last reference before the signaling is completed.
How about the attached (not even compile tested) patch? I think it should fix the issue.
Regards, Christian.
P.
Regards, Christian.
Replace the call to dma_fence_is_signaled() with nouveau_fence_base_is_signaled().
Cc: stable@vger.kernel.org # 4.10+, precise commit not to be determined Signed-off-by: Philipp Stanner phasta@kernel.org
drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 7cc84472cece..33535987d8ed 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -274,7 +274,7 @@ nouveau_fence_done(struct nouveau_fence *fence) nvif_event_block(&fctx->event); spin_unlock_irqrestore(&fctx->lock, flags); }
- return dma_fence_is_signaled(&fence->base);
- return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
base.flags);
} static long
linaro-mm-sig@lists.linaro.org