Hello,
On Sunday, July 03, 2011 5:28 PM Russell King wrote:
On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
This patch removes the need for offset parameter in dma bounce functions. This is required to let dma-mapping framework on ARM architecture use common, generic dma-mapping helpers.
I really don't like this. Really as in hate. Why? I've said in the past that the whole idea of getting rid of the sub-range functions is idiotic.
If you have to deal with cache coherence, what you _really_ want is an API which tells you the size of the original buffer and the section of that buffer which you want to handle - because the edges of the buffer need special handling.
Lets say that you have a buffer which is 256 bytes long, misaligned to half a cache line. Let's first look at the sequence for whole-buffer:
You map it for DMA from the device. This means you writeback the first and last cache lines to perserve any data shared in the overlapping cache line. The remainder you can just invalidate.
You want to access the buffer, so you use the sync_for_cpu function. If your CPU doesn't do any speculative prefetching, then you don't need to do anything. If you do, you have to invalidate the buffer, but you must preserve the overlapping cache lines which again must be written back.
You transfer ownership back to the device using sync_for_device. As you may have caused cache lines to be read in, again you need to invalidate, and the overlapping cache lines must be written back.
Now, if you ask for a sub-section of the buffer to be sync'd, you can actually eliminate those writebacks which are potentially troublesome, and which could corrupt neighbouring data.
If you get rid of the sub-buffer functions and start using the whole buffer functions for that purpose, you no longer know whether the partial cache lines are part of the buffer or not, so you have to write those back every time too.
So far, we haven't had any reports of corruption of this type (maybe folk using the sync functions are rare on ARM - thankfully) but getting rid of the range sync functions means that solving this becomes a lot more difficult because we've lost the information to make the decision.
Well, right now I haven't heard anyone who wants to remove dma_sync_single_range_for_{cpu,device}. All this is about internal implementation and dma_map_ops which uses the simplified calls, not exposed to the drivers or any public API.
I also see no reason why we loose the information. All drivers are still required to call dma_map_{single,page} to aquire dma address first. This way DMA mapping subsystem perfectly knows that the range from returned dma_addr to dma_addr+size has been used for dma operations. All calls to dma_sync_single_* operations takes dma_addr as one of the arguments, so there is no problem to check which dma range this particular sync operation fits.
In my patch I have shown that it is perfectly possible to use the common dma_map_ops structure on ARM and unify dma mapping implementation a bit with other architectures.
IMHO this is the right way. There is a need for custom dma mapping implementations (mainly related to taking the advantage of iommu controllers available on newer SoCs). I would really like to avoid another set of ifdefs or sequences of "if (iommu_supported())" all over the dma-mapping code. Even now all this code is hard to understand in the first read (due to coherent/ non-coherent sub-architectures and dmabounce code mixed in).
So I've always believed - and continue to do so - that those who want to get rid of the range sync functions are misguided and are storing up problems for the future.
I never said that I want to remove these operations from drivers API.
Best regards