ioremap issue in ION carveout heap

List overview All Threads
Download

newer

older

sharing across process of...

[PATCH] mm: cma: fix alignment...

Haojian Zhuang

23 Aug 2012 23 Aug '12

9:30 a.m.

Hi all,

I think that we have a memory mapping issue on ION carveout heap for v3.4+ kernel from android.

The scenario is User app + kernel driver (cpu) + kernel driver (dma) that all these three clients will access memory. And the memory is cacheable.

The .map_kernel() of carveout heap remaps the allocated memory buffer by ioremap().

In arm_ioremap(), we don't allow memory to be mapped. In order to make .map_kernel() working, we need to use memblock_alloc() & memblock_remove() to move the heap memory from system to reserved area. So the linear address of the memory buffer is removed from page table. And the new virtual address comes from .map_kernel() while kernel driver wants to access the buffer.

But ION use dma_sync_sg_for_devices() to flush cache that means they're using linear address from page. So they're using the NOT-EXISTED virtual address that is removed by memblock_remove().

Solution #1. .map_kernel() only returns the linear address. And there's a limitation of this solution, the heap should be always lying in low memory. So we needn't use any ioremap() and memblock_remove() any more.

Solution #2. Use vmap() in .map_kernel().

How do you think about these two solutions?

Regards Haojian

Show replies by date

Laura Abbott

23 Aug 23 Aug

2:49 p.m.

On 8/23/2012 2:30 AM, Haojian Zhuang wrote:

...

Hi all,

I think that we have a memory mapping issue on ION carveout heap for v3.4+ kernel from android.

The scenario is User app + kernel driver (cpu) + kernel driver (dma) that all these three clients will access memory. And the memory is cacheable.

The .map_kernel() of carveout heap remaps the allocated memory buffer by ioremap().

In arm_ioremap(), we don't allow memory to be mapped. In order to make .map_kernel() working, we need to use memblock_alloc() & memblock_remove() to move the heap memory from system to reserved area. So the linear address of the memory buffer is removed from page table. And the new virtual address comes from .map_kernel() while kernel driver wants to access the buffer.

But ION use dma_sync_sg_for_devices() to flush cache that means they're using linear address from page. So they're using the NOT-EXISTED virtual address that is removed by memblock_remove().

The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

...

Solution #1. .map_kernel() only returns the linear address. And there's a limitation of this solution, the heap should be always lying in low memory. So we needn't use any ioremap() and memblock_remove() any more.

How would you reserve a large block of lowmem without calling memblock_remove? The other issue is that if the memory is carved out for Ion, it can't be used for anything else. The lowmem is pretty much wasted if it's not being used constantly.

Also, if you actually want uncached carveout memory this won't work because you can't get an uncached mapping.

...

Solution #2. Use vmap() in .map_kernel().

vmap won't work because it relies on the page structures being valid.

...

How do you think about these two solutions?

Regards Haojian

Thanks, Laura

-- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

Haojian Zhuang

3:49 p.m.

On Thu, Aug 23, 2012 at 10:49 PM, Laura Abbott lauraa@codeaurora.org wrote:

...

On 8/23/2012 2:30 AM, Haojian Zhuang wrote:

...
Solution #1. .map_kernel() only returns the linear address. And there's a limitation of this solution, the heap should be always lying in low memory. So we needn't use any ioremap() and memblock_remove() any more.

How would you reserve a large block of lowmem without calling memblock_remove? The other issue is that if the memory is carved out for Ion, it can't be used for anything else. The lowmem is pretty much wasted if it's not being used constantly.

Also, if you actually want uncached carveout memory this won't work because you can't get an uncached mapping.

Em, an uncached mapping is the real problem.

...

...
Solution #2. Use vmap() in .map_kernel().

vmap won't work because it relies on the page structures being valid.

While memory buffer is allocated, physical address is also known. So we can get page structure by phys_to_page(). And we also need to create a page array to store those page structures.

Russell King - ARM Linux

9:53 p.m.

On Thu, Aug 23, 2012 at 07:49:42AM -0700, Laura Abbott wrote:

...

The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

They should not, plain and simple. They're designed to be used with bits of memory which have valid struct page structures associated with them, and that's pretty fundamental to them.

However, this raises one question. You say you're using memblock_remove(). How? Show the code, I want to know.

Now, the thing is, using memblock_remove, you are removing the memory from the system memory map, taking it away from the system. You're taking full responsibility for managing that memory. The system won't map it for you in any way.

Now, since you have full control over that memory, why would you want the overhead of using the DMA API with it?

Moreover, you have one bloody big problem with this if you want to map the memory into userspace _and_ have it cacheable there, and do DMA off it. There is _no_ _way_ for userspace to issue the cache maintanence instructions (not that userspace _should_ be doing that anyway.) The cacheflush system call is also _NOT_ _DESIGNED_ for DMA purposes (it's there to support self-modifying code ONLY.)

Now, next question, what's this large buffer for?

Haojian Zhuang

24 Aug 24 Aug

7:42 a.m.

On Fri, Aug 24, 2012 at 5:53 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

On Thu, Aug 23, 2012 at 07:49:42AM -0700, Laura Abbott wrote:

...
The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

They should not, plain and simple. They're designed to be used with bits of memory which have valid struct page structures associated with them, and that's pretty fundamental to them.

However, this raises one question. You say you're using memblock_remove(). How? Show the code, I want to know.

I don't know whether the below code is helpful. And you could get kernel with ION from git://android.git.kernel.org/kernel/common.git. The branch is android-3.4. ION driver is in drivers/gpu/ion.

static void __init brownstone_ion_reserve(void) { /* reserve memory for ION */ struct ion_platform_data *p = &ion_data; int i;

for (i = 0; i < p->nr; i++) { p->heaps[i].base = memblock_alloc(p->heaps[i].size, PAGE_SIZE); if (!p->heaps[i].base) { pr_err("%s: failed to reserve %x bytes\n", __func__, p->heaps[i].size); goto err_alloc; } memblock_remove(p->heaps[i].base, p->heaps[i].size); }

return;

err_alloc: for (; i > 0; i--) memblock_free(p->heaps[i].base, p->heaps[i].size);

return; }

...

Now, the thing is, using memblock_remove, you are removing the memory from the system memory map, taking it away from the system. You're taking full responsibility for managing that memory. The system won't map it for you in any way.

.map_kernel() in ION carveout heap calls arm_ioremap() directly. Since we can't map memory with different mapping in ARM V6+, ioremap() blocks the behavior. If we use memblock_remove() also, ioremap() can work.

...

Now, since you have full control over that memory, why would you want the overhead of using the DMA API with it?

Moreover, you have one bloody big problem with this if you want to map the memory into userspace _and_ have it cacheable there, and do DMA off it. There is _no_ _way_ for userspace to issue the cache maintanence instructions (not that userspace _should_ be doing that anyway.) The cacheflush system call is also _NOT_ _DESIGNED_ for DMA purposes (it's there to support self-modifying code ONLY.)

dmabuf can be used to hide cache flush operation. ION integrates dmabuf.

While mmap() is inovked, the vma area is allocated but mapping doesn't take effect.

While user application access memory buffer, it triggers vm_fault. And the new mapping takes effect. If DMA is ready to go, .map_dma() is called that also triggers cache flushing. Once cache flush is finished, the mapping is destroyed.

So user space application needn't to maintain the cache flush operation.

The key issue is that those cache flushing is trying to use virtual address from page, like page_to_virt(). But the address is abandoned while memblock_remove() is called.

So I suggest to use memblock_alloc() only to reserve memory. And use vmap() in .map_kernel(). Otherwise, we can't avoid to use memblock_remove().

...

Now, next question, what's this large buffer for?

Large buffer is used for graphics driver, camera driver, graphics user application. In some silicons, DMA is required continuous physical address. We have to reserve memory by ION carveout heap.

Russell King - ARM Linux

8:57 a.m.

On Fri, Aug 24, 2012 at 03:42:49PM +0800, Haojian Zhuang wrote:

...

On Fri, Aug 24, 2012 at 5:53 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
On Thu, Aug 23, 2012 at 07:49:42AM -0700, Laura Abbott wrote:

...
The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

They should not, plain and simple. They're designed to be used with bits of memory which have valid struct page structures associated with them, and that's pretty fundamental to them.

However, this raises one question. You say you're using memblock_remove(). How? Show the code, I want to know.

I don't know whether the below code is helpful. And you could get kernel with ION from git://android.git.kernel.org/kernel/common.git. The branch is android-3.4. ION driver is in drivers/gpu/ion.

static void __init brownstone_ion_reserve(void) { /* reserve memory for ION */ struct ion_platform_data *p = &ion_data; int i;
    for (i = 0; i < p->nr; i++) {
            p->heaps[i].base = memblock_alloc(p->heaps[i].size, PAGE_SIZE);
            if (!p->heaps[i].base) {
                    pr_err("%s: failed to reserve %x bytes\n",
                                    __func__, p->heaps[i].size);
                    goto err_alloc;
            }
            memblock_remove(p->heaps[i].base, p->heaps[i].size);
    }

    return;
err_alloc: for (; i > 0; i--) memblock_free(p->heaps[i].base, p->heaps[i].size);

So yet again folk abuse this stuff...

Look at how arm_memblock_steal() works, and note that we DO NOT leave blocks allocated(reserved) but removed from the system memory. To have reserved areas listed without system memory is a bug.

Plus you're supposed to take 1MB at a time, not a page at a time.

...

...
Now, the thing is, using memblock_remove, you are removing the memory from the system memory map, taking it away from the system. You're taking full responsibility for managing that memory. The system won't map it for you in any way.

.map_kernel() in ION carveout heap calls arm_ioremap() directly. Since we can't map memory with different mapping in ARM V6+, ioremap() blocks the behavior. If we use memblock_remove() also, ioremap() can work.

Once you've removed it from the system using the arm_memblock_steal() you _can_ ioremap it all you want, because it won't have a struct page associated with it. It's quite possible that taking a page at a time results in the mm layers getting confused and providing a struct page for these things.

...

...
Now, since you have full control over that memory, why would you want the overhead of using the DMA API with it?

Moreover, you have one bloody big problem with this if you want to map the memory into userspace _and_ have it cacheable there, and do DMA off it. There is _no_ _way_ for userspace to issue the cache maintanence instructions (not that userspace _should_ be doing that anyway.) The cacheflush system call is also _NOT_ _DESIGNED_ for DMA purposes (it's there to support self-modifying code ONLY.)

dmabuf can be used to hide cache flush operation. ION integrates dmabuf.

While mmap() is inovked, the vma area is allocated but mapping doesn't take effect.

While user application access memory buffer, it triggers vm_fault. And the new mapping takes effect. If DMA is ready to go, .map_dma() is called that also triggers cache flushing. Once cache flush is finished, the mapping is destroyed.

So user space application needn't to maintain the cache flush operation.

The key issue is that those cache flushing is trying to use virtual address from page, like page_to_virt(). But the address is abandoned while memblock_remove() is called.

So I suggest to use memblock_alloc() only to reserve memory. And use vmap() in .map_kernel(). Otherwise, we can't avoid to use memblock_remove().

That means you will have the kernel alias of the memory present in the system.

...

...
Now, next question, what's this large buffer for?

Large buffer is used for graphics driver, camera driver, graphics user application. In some silicons, DMA is required continuous physical address. We have to reserve memory by ION carveout heap.

So all the usual reasons found with embedded systems.

The graphics hardware obviously needs contiguous scan out buffers but the graphics engine probably has its own MMU which can be used to avoid huge buffers for the graphics - especially when you use DRM instead of the legacy framebuffer stuff. This is the direction you _should_ be heading (which is where x86 went years ago because of these exact same problems) rather than trying to solve a problem in a less desirable way.

Can the camera hardware support capturing to scatterlist? Does it have it's own MMU or can the DMA be programmed with a linked-list of pages to operate on?

As for the user application, user applications really don't care where they get their memory from (unless they're intimately involved with the hardware.)

I'm currently working with the Vivante GPU on the Marvell Dove stuff, and I'm finding that the existing approach there of "we'll allocate a huge amount of system memory for the GPU" is just stupid software design, probably brought about by "this is how we've done it in the past." There's no need for that with this, because the GPU has its own MMU, and we can deal with pixmaps in non-contiguous memory just fine. And coupled with a proper DRM driver... it's all beginning to work quite nicely here without such large carveouts of main system memory.

Nishanth Peethambaran

1:56 p.m.

On Fri, Aug 24, 2012 at 2:27 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

On Fri, Aug 24, 2012 at 03:42:49PM +0800, Haojian Zhuang wrote:

...
On Fri, Aug 24, 2012 at 5:53 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
On Thu, Aug 23, 2012 at 07:49:42AM -0700, Laura Abbott wrote:

...
The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

They should not, plain and simple. They're designed to be used with bits of memory which have valid struct page structures associated with them, and that's pretty fundamental to them.

However, this raises one question. You say you're using memblock_remove(). How? Show the code, I want to know.

I don't know whether the below code is helpful. And you could get kernel with ION from git://android.git.kernel.org/kernel/common.git. The branch is android-3.4. ION driver is in drivers/gpu/ion.

static void __init brownstone_ion_reserve(void) { /* reserve memory for ION */ struct ion_platform_data *p = &ion_data; int i;
    for (i = 0; i < p->nr; i++) {
            p->heaps[i].base = memblock_alloc(p->heaps[i].size, PAGE_SIZE);
            if (!p->heaps[i].base) {
                    pr_err("%s: failed to reserve %x bytes\n",
                                    __func__, p->heaps[i].size);
                    goto err_alloc;
            }
            memblock_remove(p->heaps[i].base, p->heaps[i].size);
    }

    return;
err_alloc: for (; i > 0; i--) memblock_free(p->heaps[i].base, p->heaps[i].size);
So yet again folk abuse this stuff...

Look at how arm_memblock_steal() works, and note that we DO NOT leave blocks allocated(reserved) but removed from the system memory. To have reserved areas listed without system memory is a bug.

Plus you're supposed to take 1MB at a time, not a page at a time.

If we use arm_memblock_steal(), why is the requirement for taking 1MB at a time, not a page? The ioremap_page_range() should be able to map to do sub-page mapping. Right?

...

...
...
Now, the thing is, using memblock_remove, you are removing the memory from the system memory map, taking it away from the system. You're taking full responsibility for managing that memory. The system won't map it for you in any way.

.map_kernel() in ION carveout heap calls arm_ioremap() directly. Since we can't map memory with different mapping in ARM V6+, ioremap() blocks the behavior. If we use memblock_remove() also, ioremap() can work.

Once you've removed it from the system using the arm_memblock_steal() you _can_ ioremap it all you want, because it won't have a struct page associated with it. It's quite possible that taking a page at a time results in the mm layers getting confused and providing a struct page for these things.

...
...
Now, since you have full control over that memory, why would you want the overhead of using the DMA API with it?

Moreover, you have one bloody big problem with this if you want to map the memory into userspace _and_ have it cacheable there, and do DMA off it. There is _no_ _way_ for userspace to issue the cache maintanence instructions (not that userspace _should_ be doing that anyway.) The cacheflush system call is also _NOT_ _DESIGNED_ for DMA purposes (it's there to support self-modifying code ONLY.)

dmabuf can be used to hide cache flush operation. ION integrates dmabuf.

While mmap() is inovked, the vma area is allocated but mapping doesn't take effect.

While user application access memory buffer, it triggers vm_fault. And the new mapping takes effect. If DMA is ready to go, .map_dma() is called that also triggers cache flushing. Once cache flush is finished, the mapping is destroyed.

So user space application needn't to maintain the cache flush operation.

The key issue is that those cache flushing is trying to use virtual address from page, like page_to_virt(). But the address is abandoned while memblock_remove() is called.

So I suggest to use memblock_alloc() only to reserve memory. And use vmap() in .map_kernel(). Otherwise, we can't avoid to use memblock_remove().

That means you will have the kernel alias of the memory present in the system.

...
...
Now, next question, what's this large buffer for?

Large buffer is used for graphics driver, camera driver, graphics user application. In some silicons, DMA is required continuous physical address. We have to reserve memory by ION carveout heap.

So all the usual reasons found with embedded systems.

The graphics hardware obviously needs contiguous scan out buffers but the graphics engine probably has its own MMU which can be used to avoid huge buffers for the graphics - especially when you use DRM instead of the legacy framebuffer stuff. This is the direction you _should_ be heading (which is where x86 went years ago because of these exact same problems) rather than trying to solve a problem in a less desirable way.

Can the camera hardware support capturing to scatterlist? Does it have it's own MMU or can the DMA be programmed with a linked-list of pages to operate on?

As for the user application, user applications really don't care where they get their memory from (unless they're intimately involved with the hardware.)

I'm currently working with the Vivante GPU on the Marvell Dove stuff, and I'm finding that the existing approach there of "we'll allocate a huge amount of system memory for the GPU" is just stupid software design, probably brought about by "this is how we've done it in the past." There's no need for that with this, because the GPU has its own MMU, and we can deal with pixmaps in non-contiguous memory just fine. And coupled with a proper DRM driver... it's all beginning to work quite nicely here without such large carveouts of main system memory.

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Haojian Zhuang

3:55 p.m.

On Fri, Aug 24, 2012 at 4:57 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

On Fri, Aug 24, 2012 at 03:42:49PM +0800, Haojian Zhuang wrote:

...
On Fri, Aug 24, 2012 at 5:53 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
On Thu, Aug 23, 2012 at 07:49:42AM -0700, Laura Abbott wrote:

...
The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

They should not, plain and simple. They're designed to be used with bits of memory which have valid struct page structures associated with them, and that's pretty fundamental to them.

However, this raises one question. You say you're using memblock_remove(). How? Show the code, I want to know.

I don't know whether the below code is helpful. And you could get kernel with ION from git://android.git.kernel.org/kernel/common.git. The branch is android-3.4. ION driver is in drivers/gpu/ion.

static void __init brownstone_ion_reserve(void) { /* reserve memory for ION */ struct ion_platform_data *p = &ion_data; int i;
    for (i = 0; i < p->nr; i++) {
            p->heaps[i].base = memblock_alloc(p->heaps[i].size, PAGE_SIZE);
            if (!p->heaps[i].base) {
                    pr_err("%s: failed to reserve %x bytes\n",
                                    __func__, p->heaps[i].size);
                    goto err_alloc;
            }
            memblock_remove(p->heaps[i].base, p->heaps[i].size);
    }

    return;
err_alloc: for (; i > 0; i--) memblock_free(p->heaps[i].base, p->heaps[i].size);
So yet again folk abuse this stuff...

Look at how arm_memblock_steal() works, and note that we DO NOT leave blocks allocated(reserved) but removed from the system memory. To have reserved areas listed without system memory is a bug.

Plus you're supposed to take 1MB at a time, not a page at a time.

Even I added memblock_free(), it doesn't change the result that linear address mapping doesn't exist. It results dmac_xxx() flush fails.

...

...
...
Now, the thing is, using memblock_remove, you are removing the memory from the system memory map, taking it away from the system. You're taking full responsibility for managing that memory. The system won't map it for you in any way.

.map_kernel() in ION carveout heap calls arm_ioremap() directly. Since we can't map memory with different mapping in ARM V6+, ioremap() blocks the behavior. If we use memblock_remove() also, ioremap() can work.

Once you've removed it from the system using the arm_memblock_steal() you _can_ ioremap it all you want, because it won't have a struct page associated with it. It's quite possible that taking a page at a time results in the mm layers getting confused and providing a struct page for these things.

...
...
Now, since you have full control over that memory, why would you want the overhead of using the DMA API with it?

Moreover, you have one bloody big problem with this if you want to map the memory into userspace _and_ have it cacheable there, and do DMA off it. There is _no_ _way_ for userspace to issue the cache maintanence instructions (not that userspace _should_ be doing that anyway.) The cacheflush system call is also _NOT_ _DESIGNED_ for DMA purposes (it's there to support self-modifying code ONLY.)

dmabuf can be used to hide cache flush operation. ION integrates dmabuf.

While mmap() is inovked, the vma area is allocated but mapping doesn't take effect.

While user application access memory buffer, it triggers vm_fault. And the new mapping takes effect. If DMA is ready to go, .map_dma() is called that also triggers cache flushing. Once cache flush is finished, the mapping is destroyed.

So user space application needn't to maintain the cache flush operation.

The key issue is that those cache flushing is trying to use virtual address from page, like page_to_virt(). But the address is abandoned while memblock_remove() is called.

So I suggest to use memblock_alloc() only to reserve memory. And use vmap() in .map_kernel(). Otherwise, we can't avoid to use memblock_remove().

That means you will have the kernel alias of the memory present in the system.

...
...
Now, next question, what's this large buffer for?

Large buffer is used for graphics driver, camera driver, graphics user application. In some silicons, DMA is required continuous physical address. We have to reserve memory by ION carveout heap.

...

Can the camera hardware support capturing to scatterlist? Does it have it's own MMU or can the DMA be programmed with a linked-list of pages to operate on?

If there's an IOMMU in silicon, we can make use of scatterlist. But there's no such device in low-end silicons.

Russell King - ARM Linux

6:28 p.m.

On Fri, Aug 24, 2012 at 11:55:43PM +0800, Haojian Zhuang wrote:

...

On Fri, Aug 24, 2012 at 4:57 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
So yet again folk abuse this stuff...

Look at how arm_memblock_steal() works, and note that we DO NOT leave blocks allocated(reserved) but removed from the system memory. To have reserved areas listed without system memory is a bug.

Plus you're supposed to take 1MB at a time, not a page at a time.

Even I added memblock_free(), it doesn't change the result that linear address mapping doesn't exist. It results dmac_xxx() flush fails.

Exactly, because those functions are only designed to be used on _system_ _memory_ - those which have a valid struct page associated with them, and which can be assured of virt_to_phys/phys_to_virt() (or more exactly, the kmap interfaces working) on those pages.

That is not the case with memory which is not under the control of the kernels memory management.

So you're really asking the impossible of the kernel.

If you need this memory to have struct pages, and you don't have the ARMv7 mapping aliasing problem to worry about, there is absolutely no no need what so ever to take them out of the system memory map with memblock_remove().

If you leave them in the block of memory which the kernel believes there to be RAM, yes you can't use ioremap() on it, but you can kmap it, and you can also use DMA with it too - because it will be just like every other bit of normal system memory. It just won't be freed into the memory allocator pools because you have it reserved.

You could also use vmap() on a list of struct pages to map it too.

...

...
Can the camera hardware support capturing to scatterlist? Does it have it's own MMU or can the DMA be programmed with a linked-list of pages to operate on?

If there's an IOMMU in silicon, we can make use of scatterlist. But there's no such device in low-end silicons.

Welcome to why "low-end" cut corners cause major problems, some of which are extremely difficult to solve, and some which are even not worth solving. :)

You're really getting into that territory here - you're asking the system for two things simultaneously which aren't compatible with each other.

Rebecca Schultz Zavin

7:22 p.m.

On Fri, Aug 24, 2012 at 1:57 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...

On Fri, Aug 24, 2012 at 03:42:49PM +0800, Haojian Zhuang wrote:

...
On Fri, Aug 24, 2012 at 5:53 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
On Thu, Aug 23, 2012 at 07:49:42AM -0700, Laura Abbott wrote:

...
The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

They should not, plain and simple. They're designed to be used with bits of memory which have valid struct page structures associated with them, and that's pretty fundamental to them.

However, this raises one question. You say you're using memblock_remove(). How? Show the code, I want to know.

I don't know whether the below code is helpful. And you could get kernel with ION from git://android.git.kernel.org/kernel/common.git. The branch is android-3.4. ION driver is in drivers/gpu/ion.

static void __init brownstone_ion_reserve(void) { /* reserve memory for ION */ struct ion_platform_data *p = &ion_data; int i;
    for (i = 0; i < p->nr; i++) {
            p->heaps[i].base = memblock_alloc(p->heaps[i].size, PAGE_SIZE);
            if (!p->heaps[i].base) {
                    pr_err("%s: failed to reserve %x bytes\n",
                                    __func__, p->heaps[i].size);
                    goto err_alloc;
            }
            memblock_remove(p->heaps[i].base, p->heaps[i].size);
    }

    return;
err_alloc: for (; i > 0; i--) memblock_free(p->heaps[i].base, p->heaps[i].size);
So yet again folk abuse this stuff...

Look at how arm_memblock_steal() works, and note that we DO NOT leave blocks allocated(reserved) but removed from the system memory. To have reserved areas listed without system memory is a bug.

Plus you're supposed to take 1MB at a time, not a page at a time.

...
...
Now, the thing is, using memblock_remove, you are removing the memory from the system memory map, taking it away from the system. You're taking full responsibility for managing that memory. The system won't map it for you in any way.

.map_kernel() in ION carveout heap calls arm_ioremap() directly. Since we can't map memory with different mapping in ARM V6+, ioremap() blocks the behavior. If we use memblock_remove() also, ioremap() can work.

Once you've removed it from the system using the arm_memblock_steal() you _can_ ioremap it all you want, because it won't have a struct page associated with it. It's quite possible that taking a page at a time results in the mm layers getting confused and providing a struct page for these things.

...
...
Now, since you have full control over that memory, why would you want the overhead of using the DMA API with it?

Moreover, you have one bloody big problem with this if you want to map the memory into userspace _and_ have it cacheable there, and do DMA off it. There is _no_ _way_ for userspace to issue the cache maintanence instructions (not that userspace _should_ be doing that anyway.) The cacheflush system call is also _NOT_ _DESIGNED_ for DMA purposes (it's there to support self-modifying code ONLY.)

dmabuf can be used to hide cache flush operation. ION integrates dmabuf.

While mmap() is inovked, the vma area is allocated but mapping doesn't take effect.

While user application access memory buffer, it triggers vm_fault. And the new mapping takes effect. If DMA is ready to go, .map_dma() is called that also triggers cache flushing. Once cache flush is finished, the mapping is destroyed.

So user space application needn't to maintain the cache flush operation.

The key issue is that those cache flushing is trying to use virtual address from page, like page_to_virt(). But the address is abandoned while memblock_remove() is called.

So I suggest to use memblock_alloc() only to reserve memory. And use vmap() in .map_kernel(). Otherwise, we can't avoid to use memblock_remove().

That means you will have the kernel alias of the memory present in the system.

...
...
Now, next question, what's this large buffer for?

Large buffer is used for graphics driver, camera driver, graphics user application. In some silicons, DMA is required continuous physical address. We have to reserve memory by ION carveout heap.

So all the usual reasons found with embedded systems.

The graphics hardware obviously needs contiguous scan out buffers but the graphics engine probably has its own MMU which can be used to avoid huge buffers for the graphics - especially when you use DRM instead of the legacy framebuffer stuff. This is the direction you _should_ be heading (which is where x86 went years ago because of these exact same problems) rather than trying to solve a problem in a less desirable way.

Can the camera hardware support capturing to scatterlist? Does it have it's own MMU or can the DMA be programmed with a linked-list of pages to operate on?

As for the user application, user applications really don't care where they get their memory from (unless they're intimately involved with the hardware.)

I'm currently working with the Vivante GPU on the Marvell Dove stuff, and I'm finding that the existing approach there of "we'll allocate a huge amount of system memory for the GPU" is just stupid software design, probably brought about by "this is how we've done it in the past." There's no need for that with this, because the GPU has its own MMU, and we can deal with pixmaps in non-contiguous memory just fine. And coupled with a proper DRM driver... it's all beginning to work quite nicely here without such large carveouts of main system memory.

We're starting to see the same thing come to fruition in silicon. Unfortunately for our use cases, we really need zero copy graphics and multimedia, so those mmu's have to be available on every block -- isp, graphics, video decoder, and display. If any one in this chain can't support non-contiguous memory we have to fall back to carveouts. Also, while some of the newest silicon meets this requirement we still often have a requirement for physically contiguous memory to support secure playback of DRM content. Typically there aren't enough firewalls in the secure side to wall off a series of smaller allocations. For the short term, carveouts are probably here to stay, but again I see no reason to remove this memory from the system. It's too difficult to manage that way.

Rebecca

Rebecca Schultz Zavin

7:15 p.m.

On Fri, Aug 24, 2012 at 12:42 AM, Haojian Zhuang haojian.zhuang@gmail.com wrote:

...

On Fri, Aug 24, 2012 at 5:53 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:

...
On Thu, Aug 23, 2012 at 07:49:42AM -0700, Laura Abbott wrote:

...
The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

They should not, plain and simple. They're designed to be used with bits of memory which have valid struct page structures associated with them, and that's pretty fundamental to them.

However, this raises one question. You say you're using memblock_remove(). How? Show the code, I want to know.

I don't know whether the below code is helpful. And you could get kernel with ION from git://android.git.kernel.org/kernel/common.git. The branch is android-3.4. ION driver is in drivers/gpu/ion.

static void __init brownstone_ion_reserve(void) { /* reserve memory for ION */ struct ion_platform_data *p = &ion_data; int i;
    for (i = 0; i < p->nr; i++) {
            p->heaps[i].base = memblock_alloc(p->heaps[i].size, PAGE_SIZE);
            if (!p->heaps[i].base) {
                    pr_err("%s: failed to reserve %x bytes\n",
                                    __func__, p->heaps[i].size);
                    goto err_alloc;
            }
            memblock_remove(p->heaps[i].base, p->heaps[i].size);
    }

    return;
err_alloc: for (; i > 0; i--) memblock_free(p->heaps[i].base, p->heaps[i].size);
    return;
}

FYI I helpfully included ion_reserve in gpu/ion/ion.c so you don't have to code this yourself. Note that that function doesn't call memblock_remove, it only reserves the memory.

...

...
Now, the thing is, using memblock_remove, you are removing the memory from the system memory map, taking it away from the system. You're taking full responsibility for managing that memory. The system won't map it for you in any way.

.map_kernel() in ION carveout heap calls arm_ioremap() directly. Since we can't map memory with different mapping in ARM V6+, ioremap() blocks the behavior. If we use memblock_remove() also, ioremap() can work.

...
Now, since you have full control over that memory, why would you want the overhead of using the DMA API with it?

Moreover, you have one bloody big problem with this if you want to map the memory into userspace _and_ have it cacheable there, and do DMA off it. There is _no_ _way_ for userspace to issue the cache maintanence instructions (not that userspace _should_ be doing that anyway.) The cacheflush system call is also _NOT_ _DESIGNED_ for DMA purposes (it's there to support self-modifying code ONLY.)

dmabuf can be used to hide cache flush operation. ION integrates dmabuf.

While mmap() is inovked, the vma area is allocated but mapping doesn't take effect.

While user application access memory buffer, it triggers vm_fault. And the new mapping takes effect. If DMA is ready to go, .map_dma() is called that also triggers cache flushing. Once cache flush is finished, the mapping is destroyed.

So user space application needn't to maintain the cache flush operation.

The key issue is that those cache flushing is trying to use virtual address from page, like page_to_virt(). But the address is abandoned while memblock_remove() is called.

So I suggest to use memblock_alloc() only to reserve memory. And use vmap() in .map_kernel(). Otherwise, we can't avoid to use memblock_remove().

...
Now, next question, what's this large buffer for?

Large buffer is used for graphics driver, camera driver, graphics user application. In some silicons, DMA is required continuous physical address. We have to reserve memory by ION carveout heap.

Rebecca Schultz Zavin

7:10 p.m.

On Thu, Aug 23, 2012 at 7:49 AM, Laura Abbott lauraa@codeaurora.org wrote:

...

On 8/23/2012 2:30 AM, Haojian Zhuang wrote:

...
Hi all,

I think that we have a memory mapping issue on ION carveout heap for v3.4+ kernel from android.

The scenario is User app + kernel driver (cpu) + kernel driver (dma) that all these three clients will access memory. And the memory is cacheable.

The .map_kernel() of carveout heap remaps the allocated memory buffer by ioremap().

In arm_ioremap(), we don't allow memory to be mapped. In order to make .map_kernel() working, we need to use memblock_alloc() & memblock_remove() to move the heap memory from system to reserved area. So the linear address of the memory buffer is removed from page table. And the new virtual address comes from .map_kernel() while kernel driver wants to access the buffer.

But ION use dma_sync_sg_for_devices() to flush cache that means they're using linear address from page. So they're using the NOT-EXISTED virtual address that is removed by memblock_remove().

The problem is more fundamental than that. In setting up the sg_list, Ion calls phys_to_page on the carved out memory. There's no guarantee in general that the page returned is valid at all so if addr is a physical address of memory carved out with memblock_remove, page_to_phys(phys_to_page(addr)) != addr.

Internally, we've gotten around this by using the dma_address field in the sg_list to indicate carved out memory. Obviously most APIs rely on sg_lists having a page so this isn't very generic.

Really this problem boils down to the question of should the dma APIs support memory removed with memblock_remove and if so, how?

...
Solution #1. .map_kernel() only returns the linear address. And there's a limitation of this solution, the heap should be always lying in low memory. So we needn't use any ioremap() and memblock_remove() any more.

How would you reserve a large block of lowmem without calling memblock_remove? The other issue is that if the memory is carved out for Ion, it can't be used for anything else. The lowmem is pretty much wasted if it's not being used constantly.

You can use memblock_reserve.

...

Also, if you actually want uncached carveout memory this won't work because you can't get an uncached mapping.

...
Solution #2. Use vmap() in .map_kernel().

vmap won't work because it relies on the page structures being valid.

...
How do you think about these two solutions?

Regards Haojian

Thanks, Laura

-- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

Rebecca Schultz Zavin

7:09 p.m.

On Thu, Aug 23, 2012 at 2:30 AM, Haojian Zhuang haojian.zhuang@gmail.com wrote:

...

Hi all,

I think that we have a memory mapping issue on ION carveout heap for v3.4+ kernel from android.

The scenario is User app + kernel driver (cpu) + kernel driver (dma) that all these three clients will access memory. And the memory is cacheable.

The .map_kernel() of carveout heap remaps the allocated memory buffer by ioremap().

In arm_ioremap(), we don't allow memory to be mapped. In order to make .map_kernel() working, we need to use memblock_alloc() & memblock_remove() to move the heap memory from system to reserved area. So the linear address of the memory buffer is removed from page table. And the new virtual address comes from .map_kernel() while kernel driver wants to access the buffer.

You are correct, there is a bug in map_kernel for the contigious heap -- it's bit rotted a bit as it's not in use on the program I am currently working on. It shouldn't be using ioremap, it should be using vmap for highmem or can just return the mapping for lowmem if appropriate. I will take a patch that fixes this or fix it myself when I get a chance.

...

But ION use dma_sync_sg_for_devices() to flush cache that means they're using linear address from page. So they're using the NOT-EXISTED virtual address that is removed by memblock_remove().

When I modified ion to use the dma api's I made a conscious decision to stop supporting memory that was removed from the kernel. The benefits of the dma api's outweighed the small cost of having the metadata around in my mind. Memory for contiguous heaps can be put aside using memblock_reserve instead.

...

Solution #1. .map_kernel() only returns the linear address. And there's a limitation of this solution, the heap should be always lying in low memory. So we needn't use any ioremap() and memblock_remove() any more.

Solution #2. Use vmap() in .map_kernel().

How do you think about these two solutions?

Regards Haojian

Laura Abbott

27 Aug 27 Aug

4:25 p.m.

On 8/24/2012 12:09 PM, Rebecca Schultz Zavin wrote:

...

You are correct, there is a bug in map_kernel for the contigious heap -- it's bit rotted a bit as it's not in use on the program I am currently working on. It shouldn't be using ioremap, it should be using vmap for highmem or can just return the mapping for lowmem if appropriate. I will take a patch that fixes this or fix it myself when I get a chance.

...
But ION use dma_sync_sg_for_devices() to flush cache that means they're using linear address from page. So they're using the NOT-EXISTED virtual address that is removed by memblock_remove().

When I modified ion to use the dma api's I made a conscious decision to stop supporting memory that was removed from the kernel. The benefits of the dma api's outweighed the small cost of having the metadata around in my mind. Memory for contiguous heaps can be put aside using memblock_reserve instead.

I'm concerned about dropping support for memblock_remove'd memory. On ARMv7 targets, this places a restriction on only having carveout regions in highmem to avoid the multiple mapping problems if we need to use writecombined memory. This thread has established that using the dma apis on memory reserved with memblock_remove can't happen, but what about memory that just needs to be uncached? This seems like an appropriate use of memblock_remove, especially for lowmem areas.

Also, what about targets that lack highmem?

Laura

...

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

Rebecca Schultz Zavin

4:58 p.m.

On Mon, Aug 27, 2012 at 9:25 AM, Laura Abbott lauraa@codeaurora.org wrote:

...

On 8/24/2012 12:09 PM, Rebecca Schultz Zavin wrote:

...
You are correct, there is a bug in map_kernel for the contigious heap -- it's bit rotted a bit as it's not in use on the program I am currently working on. It shouldn't be using ioremap, it should be using vmap for highmem or can just return the mapping for lowmem if appropriate. I will take a patch that fixes this or fix it myself when I get a chance.

...
But ION use dma_sync_sg_for_devices() to flush cache that means they're using linear address from page. So they're using the NOT-EXISTED virtual address that is removed by memblock_remove().

When I modified ion to use the dma api's I made a conscious decision to stop supporting memory that was removed from the kernel. The benefits of the dma api's outweighed the small cost of having the metadata around in my mind. Memory for contiguous heaps can be put aside using memblock_reserve instead.

I'm concerned about dropping support for memblock_remove'd memory. On ARMv7 targets, this places a restriction on only having carveout regions in highmem to avoid the multiple mapping problems if we need to use writecombined memory. This thread has established that using the dma apis on memory reserved with memblock_remove can't happen, but what about memory that just needs to be uncached? This seems like an appropriate use of memblock_remove, especially for lowmem areas.

Also, what about targets that lack highmem?

You don't have to have carveout regions in highmem. If you read the section of the ARM reference manual on mismatched memory attributes they've qualified the conditions that cause problems -- at one point I think it just said behavior was undefined. The issues only occur if you have two mappings with different sharability attributes. In our case we have two normal mappings, one for the unity map and one for ion. In this case you must properly manage in software the two mappings to ensure coherency, i.e. flushing and invalidating when necessary between accesses to the two mappings, but otherwise no issues with occur. Note that I've checked this with both ARM and Qualcomm architects to make sure it is correct. Hope this helps clarify things for you.

Rebecca

...

Laura

...

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

4839

days inactive

4843

days old

linaro-mm-sig@lists.linaro.org

14 comments

participants

tags (0)

participants (5)

Haojian Zhuang
Laura Abbott
Nishanth Peethambaran
Rebecca Schultz Zavin
Russell King - ARM Linux