Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

19 Jun 2020

      On Fri, Jun 19, 2020 at 05:06:04PM +0200, Daniel Vetter wrote:
...
On Fri, Jun 19, 2020 at 1:39 PM Jason Gunthorpe jgg@ziepe.ca wrote:
...
On Fri, Jun 19, 2020 at 09:22:09AM +0200, Daniel Vetter wrote:
...
...
As I've understood GPU that means you need to show that the commands
associated with the buffer have completed. This is all local stuff
within the driver, right? Why use fence (other than it already exists)
Because that's the end-of-dma thing. And it's cross-driver for the
above reasons, e.g.

device A renders some stuff. Userspace gets dma_fence A out of that

(well sync_file or one of the other uapi interfaces, but you get the
idea)

userspace (across process or just different driver) issues more

rendering for device B, which depends upon the rendering done on
device A. So dma_fence A is an dependency and will block this dma
operation. Userspace (and the kernel) gets dma_fence B out of this

because unfortunate reasons, the same rendering on device B also

needs a userptr buffer, which means that dma_fence B is also the one
that the mmu_range_notifier needs to wait on before it can tell core
mm that it can go ahead and release those pages
I was afraid you'd say this - this is complete madness for other DMA
devices to borrow the notifier hook of the first device!
The first device might not even have a notifier. This is the 2nd
device, waiting on a dma_fence of its own, but which happens to be
queued up as a dma operation behind something else.
...
What if the first device is a page faulting device and doesn't call
dma_fence??
Not sure what you mean with this ... even if it does page-faulting for
some other reasons, it'll emit a dma_fence which the 2nd device can
consume as a dependency.
At some point the pages under the buffer have to be either pinned
or protected by mmu notifier. So each and every single device doing
DMA to these pages must either pin, or use mmu notifier.
Driver A should never 'borrow' a notifier from B
If each driver controls its own lifetime of the buffers, why can't the
driver locally wait for its device to finish?
Can't the GPUs cancel work that is waiting on a DMA fence? Ie if
Driver A detects that work completed and wants to trigger a DMA fence,
but it now knows the buffer is invalidated, can't it tell driver B to
give up?
...
The problem is that there's piles of other dependencies for a dma job.
GPU doesn't just consume a single buffer each time, it consumes entire
lists of buffers and mixes them all up in funny ways. Some of these
buffers are userptr, entirely local to the device. Other buffers are
just normal device driver allocations (and managed with some shrinker
to keep them in check). And then there's the actually shared dma-buf
with other devices. The trouble is that they're all bundled up
together.
But why does this matter? Does the GPU itself consume some work and
then stall internally waiting for an external DMA fence?
Otherwise I would expect this dependency chain should be breakable by
aborting work waiting on fences upon invalidation (without stalling)
...
...
Do not need to wait on dma_fence in notifiers.
Maybe :-) The goal of this series is more to document current rules
and make them more consistent. Fixing them if we don't like them might
be a follow-up task, but that would likely be a pile more work. First
we need to know what the exact shape of the problem even is.
Fair enough
Jason

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations