Hi Daniel,
On 09/28/2011 04:22 PM, Daniel Vetter wrote:
On Wed, Sep 28, 2011 at 10:51:08AM +0200, Hans Verkuil wrote:
On Tuesday, September 27, 2011 16:19:56 Daniel Vetter wrote:
Hi Hans,
I'll try to explain a bit, after all I've been pushing this attachment buisness quite a bit.
On Tue, Sep 27, 2011 at 03:24:24PM +0200, Hans Verkuil wrote:
OK, it is not clear to me what the purpose is of the attachments.
If I understand the discussion from this list correctly, then the idea is that each device that wants to use this buffer first attaches itself by calling dma_buf_attach().
Then at some point the application asks some driver to export the buffer. So the driver calls dma_buf_export() and passes its own dma_buf_ops. In other words, this becomes the driver that 'controls' the memory, right?
Actually, the ordering is the other way round. First, some driver calls dam_buf_export, userspace then passes around the fd to all other drivers, they do an import and call attach. While all this happens, the driver that exported the dma_buf does not (yet) allocate any backing storage.
Ah. This really should be documented better :-)
Another driver that receives the fd will call dma_buf_get() and can then call e.g. get_scatterlist from dma_buf->ops. (As an aside: I would make inline functions that take a dma_buf pointer and call the corresponding op, rather than requiring drivers to go through ops directly)
Well, drivers should only call get_scatterlist when they actually need to access the memory. This way the originating driver can go through the list of all attached devices and decide where to allocate backing storage on the first get_scatterlist.
Time is something I am now worried with above approach. In Linaro connect in Cambridge, I had raised an idea of drivers registering their memory requirement capabilities at their startup (and vice-versa). I will try to elaborate it again below:
Driver A and Driver B wish to share the same set of buffer pool. For some reason both have different buffer allocation requirements (one needs contiguous allocation and other can use its own IOMMU to manage the buffer chunks) When driver A probe is called, it will *also* register its capabilities with some buffer bank (we can think of extending vb2 in case of v4l2). Similarly the driver B will register its capabilities. Where to keep this piece of meta-data, I am not sure at this point of time. When the first allocation call happens (from application, which may call A or B in whatever order), the allocator will check the common set of buffer requirements and then make appropriate allocation, going through all the drivers buffer capability information (which should be quick). Even if the application asks the drivers to release buffers and try to request again after some time, the allocation will not take much time, as the meta-data will be ready with the (central)buffer bank. However saying that, I am again not sure how to consider scenario when the buffer capabilities of both devices dont match at all. May be, such cases we can take care during the driver design itself.
For above, I was thinking of some multimedia application like camera. User may open the camera app, which starts showing preview. He may choose to close it and then immediately reopen it for his crazy reasons. If we take too much time just to decide the kind of buffer to allocate, the start-up time of the application will be too much.
Unless I am mistaken there is nothing there in the attachment datastructure at this moment to help determine the best device to use for the scatterlist. Right? It's just a list of devices together with an opaque pointer.
Well, I want a struct device pointer in there, not something opaque. But yes, an attachment is just bookkeeping.
Another thing I don't get (sorry, I must be really dense today) is that the get_scatterlist op is set by the dma_buf_export call. I would expect that that op is part of the attachment data since I would expect that to be device specific.
There's a dma_buf_attachment parameter so you can get at the device the mapping is for.
At least, my assumption is that the actual memory is allocated by the first call to get_scatterlist?
Yes.
Actually, I think this might be key: who allocates the actual memory and when? There is no documentation whatsoever on this crucial topic.
Ok, let me elaborate a bit on how I see this. The memory will be allocated by whichever driver exported the dma_buf originally. We probably end up with a bunch of helper fucntion for the common case, but this approach allows crazy stuff like sharing special memory (e.g. TILER on omap) which can only be accessed by certain devices. I think this is easieast with a few examples:
On a sane hw, e.g. buffer exported by i915 on x86: attach is a noop (and hence should be optional), in get_scatterlist I'll just create a sg_list out of the struct page * array I get from the gem backing storage and the call map_sg and return that sg_list (now mapped into the device address space).
On insane hw with strange dma requirements (like dram bank placement, contigious, hugepagesize for puny tlbs, ...): Attach could check whether the buffer is already fixed or not (e.g. the kernel fb console probably can't be moved) and if the fixed buffer could be mapped by the device. That's we attach needs a return value. If the buffer is not pinned or not yet allocated, do nothing.
On get_scatterlist walk the attachment list and grab the dma requirements of all devices (via some platform/dma specific stuff attached to struct device). Then walk through a list of dma allocators your platform provides (page_alloc, cma, ...) until you have one that satisfies the requirements and alloc the backing storage.
Then create an sg_list out of hit and map_sg it, like above.
For arm we probably need some helper functions to make this easier.
Sharing of a special buffer object, like a tiled buffer remapping into omap's TILER. attach would check whether the device can actually access that special io area (iirc on omap only certain devices can access the tiler). get_scatterlist would just create a 1-entry sg_list that points directly to the pyhsical address (in device space) of the remapped buffer. This is way I want get_scatterlist to return a mapped buffer (and for api cleanliness).
Cheers, Daniel
Regards, Subash