Hi All,

On 12 September 2011 19:37, Daniel Vetter <daniel@ffwll.ch> wrote:
On Sun, Sep 11, 2011 at 10:32:20AM -0500, Clark, Rob wrote:
> On Sat, Sep 10, 2011 at 6:45 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Fri, Sep 09, 2011 at 06:36:23PM -0500, Clark, Rob wrote:
> >> with this sort of approach, if a new device is attached after the
> >> first get_scatterlist the buffer can be, if needed, migrated using the
> >> union of all the devices requirements at a point in time when no DMA
> >> is active to/from the buffer.  But if all the devices are known up
> >> front, then you never need to migrate unnecessarily.
> >
> > Well, the problem is with devices that hang onto mappings for way too long
> > so just waiting for all dma to finish to be able to fix up the buffer
> > placement is a no-go. But I think we can postpone that issue a bit,
> > especially since the drivers that tend to do this (gpus) can also evict
> > objects nilly-willy, so that should be fixable with some explicit
> > kill_your_mappings callback attached to drm_buf_attachment (or full-blown
> > sync objects a là ttm).
>
> I'm ok if the weird fallback cases aren't fast.. I just don't want
> things to explode catastrophically in weird cases.
>
> I guess in the GPU / deep pipeline case, you can at least set up to
> get an interrupt back when the GPU is done with some surface (ie. when
> it gets to a certain point in the command-stream)?  I think it is ok
> if things stall in this case until the GPU pipeline is drained (and if
> you are targeting 60fps, that is probably still faster than video,
> likely at 30fps).  Again, this is just for the cases where userspace
> doesn't do what we want, to avoid just complete failure..
>
> If the GPU is the one importing the dmabuf, it just calls
> put_scatterlist() once it gets some interrupt from the GPU.  If the
> GPU is the one exporting the dmabuf, then get_scatterlist() just
> blocks until the GPU gets the interrupt from the GPU.  (Well, I guess
> then do you need get_scatterlist_interruptable()?)

The problem with gpus is that they eat through data so _fast_ that not
caching mappings kills performance. Now for simpler gpus we could shovel
the mapping code into the dma/dma_buf subsystem and cache things there.
 
But desktop gpus already have (or will get) support for per-process gpu
address spaces and I don't thing it makes sense to put that complexity
into generic layers (nor is it imo feasible accross different gpus -
per-process stuff tends to highly integrate with command submission). So I
think we need some explicit unmap_ASAP callback support, but definitly not
for v1 of dma_buf. But with attach separated from get_scatterlist and an
explicit struct dma_buf_attachment around, such an extension should be
pretty straightforward to implement.
Thanks - and as Rob educated me over IRC #linaro-mm-sig a little while back, we can put the 'list of attachments' in dma_buf, and generic list handling as well centrally.
I should be able to post out v2 by tomorrow I think, thanks to you gents!
~me.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48



--

Thanks and regards,

Sumit Semwal

Linaro Kernel Engineer - Graphics working group

Linaro.org │ Open source software for ARM SoCs

Follow Linaro: Facebook | Twitter | Blog