On Fri, Aug 24, 2012 at 2:27 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Fri, Aug 24, 2012 at 03:42:49PM +0800, Haojian Zhuang wrote:
On Fri, Aug 24, 2012 at 5:53 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Thu, Aug 23, 2012 at 07:49:42AM -0700, Laura Abbott wrote:
The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.
Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.
Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?
They should not, plain and simple. They're designed to be used with bits of memory which have valid struct page structures associated with them, and that's pretty fundamental to them.
However, this raises one question. You say you're using memblock_remove(). How? Show the code, I want to know.
I don't know whether the below code is helpful. And you could get kernel with ION from git://android.git.kernel.org/kernel/common.git. The branch is android-3.4. ION driver is in drivers/gpu/ion.
static void __init brownstone_ion_reserve(void) { /* reserve memory for ION */ struct ion_platform_data *p = &ion_data; int i;
for (i = 0; i < p->nr; i++) { p->heaps[i].base = memblock_alloc(p->heaps[i].size, PAGE_SIZE); if (!p->heaps[i].base) { pr_err("%s: failed to reserve %x bytes\n", __func__, p->heaps[i].size); goto err_alloc; } memblock_remove(p->heaps[i].base, p->heaps[i].size); } return;
err_alloc: for (; i > 0; i--) memblock_free(p->heaps[i].base, p->heaps[i].size);
So yet again folk abuse this stuff...
Look at how arm_memblock_steal() works, and note that we DO NOT leave blocks allocated(reserved) but removed from the system memory. To have reserved areas listed without system memory is a bug.
Plus you're supposed to take 1MB at a time, not a page at a time.
If we use arm_memblock_steal(), why is the requirement for taking 1MB at a time, not a page? The ioremap_page_range() should be able to map to do sub-page mapping. Right?
Now, the thing is, using memblock_remove, you are removing the memory from the system memory map, taking it away from the system. You're taking full responsibility for managing that memory. The system won't map it for you in any way.
.map_kernel() in ION carveout heap calls arm_ioremap() directly. Since we can't map memory with different mapping in ARM V6+, ioremap() blocks the behavior. If we use memblock_remove() also, ioremap() can work.
Once you've removed it from the system using the arm_memblock_steal() you _can_ ioremap it all you want, because it won't have a struct page associated with it. It's quite possible that taking a page at a time results in the mm layers getting confused and providing a struct page for these things.
Now, since you have full control over that memory, why would you want the overhead of using the DMA API with it?
Moreover, you have one bloody big problem with this if you want to map the memory into userspace _and_ have it cacheable there, and do DMA off it. There is _no_ _way_ for userspace to issue the cache maintanence instructions (not that userspace _should_ be doing that anyway.) The cacheflush system call is also _NOT_ _DESIGNED_ for DMA purposes (it's there to support self-modifying code ONLY.)
dmabuf can be used to hide cache flush operation. ION integrates dmabuf.
While mmap() is inovked, the vma area is allocated but mapping doesn't take effect.
While user application access memory buffer, it triggers vm_fault. And the new mapping takes effect. If DMA is ready to go, .map_dma() is called that also triggers cache flushing. Once cache flush is finished, the mapping is destroyed.
So user space application needn't to maintain the cache flush operation.
The key issue is that those cache flushing is trying to use virtual address from page, like page_to_virt(). But the address is abandoned while memblock_remove() is called.
So I suggest to use memblock_alloc() only to reserve memory. And use vmap() in .map_kernel(). Otherwise, we can't avoid to use memblock_remove().
That means you will have the kernel alias of the memory present in the system.
Now, next question, what's this large buffer for?
Large buffer is used for graphics driver, camera driver, graphics user application. In some silicons, DMA is required continuous physical address. We have to reserve memory by ION carveout heap.
So all the usual reasons found with embedded systems.
The graphics hardware obviously needs contiguous scan out buffers but the graphics engine probably has its own MMU which can be used to avoid huge buffers for the graphics - especially when you use DRM instead of the legacy framebuffer stuff. This is the direction you _should_ be heading (which is where x86 went years ago because of these exact same problems) rather than trying to solve a problem in a less desirable way.
Can the camera hardware support capturing to scatterlist? Does it have it's own MMU or can the DMA be programmed with a linked-list of pages to operate on?
As for the user application, user applications really don't care where they get their memory from (unless they're intimately involved with the hardware.)
I'm currently working with the Vivante GPU on the Marvell Dove stuff, and I'm finding that the existing approach there of "we'll allocate a huge amount of system memory for the GPU" is just stupid software design, probably brought about by "this is how we've done it in the past." There's no need for that with this, because the GPU has its own MMU, and we can deal with pixmaps in non-contiguous memory just fine. And coupled with a proper DRM driver... it's all beginning to work quite nicely here without such large carveouts of main system memory.
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig