Hi Arnd,
On Fri, Dec 09, 2011 at 02:13:03PM +0000, Arnd Bergmann wrote:
On Thursday 08 December 2011, Daniel Vetter wrote:
c) only allowing streaming mappings, even if those are non-coherent (requiring strict serialization between CPU (in-kernel) and dma users of the buffer)
I think only allowing streaming access makes the most sense:
- I don't see much (if any need) for the kernel to access a dma_buf -
in all current usecases it just contains pixel data and no hw-specific things (like sg tables, command buffers, ..). At most I see the need for the kernel to access the buffer for dma bounce buffers, but that is internal to the dma subsystem (and hence does not need to be exposed).
- Userspace can still access the contents through the exporting
subsystem (e.g. use some gem mmap support). For efficiency reason gpu drivers are already messing around with cache coherency in a platform specific way (and hence violated the dma api a bit), so we could stuff the mmap coherency in there, too. When we later on extend dma_buf support so that other drivers than the gpu can export dma_bufs, we can then extend the official dma api with already a few drivers with use-patterns around.
But I still think that the kernel must not be required to enforce correct access ordering for the reasons outlined in my other mail.
I still don't think that's possible. Please explain how you expect to change the semantics of the streaming mapping API to allow multiple mappers without having explicit serialization points that are visible to all users. For simplicity, let's assume a cache coherent system with bounce buffers where map() copies the buffer to a dma area and unmap() copies it back to regular kernel memory. How does a driver know if it can touch the buffer in memory or from DMA at any given point in time? Note that this problem is the same as the cache coherency problem but may be easier to grasp.
(I'm jumping into the discussion in the middle, and might miss something that has already been talked about. I still hope what I'm about to say is relevant. :-))
In subsystems such as V4L2 where drivers deal with such large buffers, the buffers stay mapped all the time. The user explicitly gives the control of the buffers to the driver and eventually gets them back. This is already part of those APIs, whether they're using dma_buf or not. The user could have, and often has, the same buffers mapped elsewhere.
When it comes to passing these buffers between different hardware devices, either V4L2 or not, the user might not want to perform extra cache flush when the buffer memory itself is not being touched by the CPU in the process at all. I'd consider it impossible for the driver to know how the user space intends to user the buffer.
Flushing the cache is quite expensive: typically it's the best to flush the whole data cache when one needs to flush buffers. The V4L2 DQBUF and QBUF IOCTLs already have flags to suggest special cache handling for buffers.
Kind regards,