On Fri, Jun 19, 2020 at 05:06:04PM +0200, Daniel Vetter wrote:
On Fri, Jun 19, 2020 at 1:39 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Fri, Jun 19, 2020 at 09:22:09AM +0200, Daniel Vetter wrote:
As I've understood GPU that means you need to show that the commands associated with the buffer have completed. This is all local stuff within the driver, right? Why use fence (other than it already exists)
Because that's the end-of-dma thing. And it's cross-driver for the above reasons, e.g.
- device A renders some stuff. Userspace gets dma_fence A out of that
(well sync_file or one of the other uapi interfaces, but you get the idea)
- userspace (across process or just different driver) issues more
rendering for device B, which depends upon the rendering done on device A. So dma_fence A is an dependency and will block this dma operation. Userspace (and the kernel) gets dma_fence B out of this
- because unfortunate reasons, the same rendering on device B also
needs a userptr buffer, which means that dma_fence B is also the one that the mmu_range_notifier needs to wait on before it can tell core mm that it can go ahead and release those pages
I was afraid you'd say this - this is complete madness for other DMA devices to borrow the notifier hook of the first device!
The first device might not even have a notifier. This is the 2nd device, waiting on a dma_fence of its own, but which happens to be queued up as a dma operation behind something else.
What if the first device is a page faulting device and doesn't call dma_fence??
Not sure what you mean with this ... even if it does page-faulting for some other reasons, it'll emit a dma_fence which the 2nd device can consume as a dependency.
At some point the pages under the buffer have to be either pinned or protected by mmu notifier. So each and every single device doing DMA to these pages must either pin, or use mmu notifier.
Driver A should never 'borrow' a notifier from B
If each driver controls its own lifetime of the buffers, why can't the driver locally wait for its device to finish?
Can't the GPUs cancel work that is waiting on a DMA fence? Ie if Driver A detects that work completed and wants to trigger a DMA fence, but it now knows the buffer is invalidated, can't it tell driver B to give up?
The problem is that there's piles of other dependencies for a dma job. GPU doesn't just consume a single buffer each time, it consumes entire lists of buffers and mixes them all up in funny ways. Some of these buffers are userptr, entirely local to the device. Other buffers are just normal device driver allocations (and managed with some shrinker to keep them in check). And then there's the actually shared dma-buf with other devices. The trouble is that they're all bundled up together.
But why does this matter? Does the GPU itself consume some work and then stall internally waiting for an external DMA fence?
Otherwise I would expect this dependency chain should be breakable by aborting work waiting on fences upon invalidation (without stalling)
Do not need to wait on dma_fence in notifiers.
Maybe :-) The goal of this series is more to document current rules and make them more consistent. Fixing them if we don't like them might be a follow-up task, but that would likely be a pile more work. First we need to know what the exact shape of the problem even is.
Fair enough
Jason