Hi, All
We met question about dmac_map_area & dmac_flush_range from user addr. mcr would not return on armv7 processor.
Existing ion carveout heap does not support partial cache flush. Total cache will be flushed at all.
There is only one dirty bit for carveout heap, as well as sg_table->nents. drivers/gpu/ion/ion_carveout_heap.c ion_carveout_heap_map_dma -> sg_alloc_table(table, 1, GFP_KERNEL); ion_buffer_alloc_dirty -> pages = buffer->sg_table->nents;
We want to support partial cache flush. Align to cache line, instead of PAGE_SIZE, for efficiency consideration. We have considered extended dirty bit, but looks like only align to PAGE_SIZE.
For experiment we modify ioctl ION_IOC_SYNC on armv7. And directly use dmac_map_area & dmac_flush_range with add from user space. However, we find dmac_map_area can not work with this addr from user space. In fact, it is mcr can not work with addr from user space, it would hung. Also, ion_vm_falut would happen twice. The first time is from __dabt_usr, when we access the mmaped buffer, it is fine. The second is from __davt_svc, it is caused by mcr, it is strange?
ION malloc carveout heap addr = user mmap user access addr, ion_vm_fault (__dabt_usr), build page table, and vm_insert_page. dmac_map_area & dmac_flush_range with addr -> ion_vm_fault (__davt_svc) mcr hung.
Not understand why ion_vm_fault happen twice, where page table has been build. Why mcr will hung with addr from user space.
Besides, no problem with ION on 3.0, which do not use ion_vm_fault.
Any suggestion?
Thanks
On Mon, Aug 27, 2012 at 9:51 AM, zhangfei gao zhangfei.gao@gmail.com wrote:
Hi, All
We met question about dmac_map_area & dmac_flush_range from user addr. mcr would not return on armv7 processor.
Existing ion carveout heap does not support partial cache flush. Total cache will be flushed at all.
There is only one dirty bit for carveout heap, as well as sg_table->nents. drivers/gpu/ion/ion_carveout_heap.c ion_carveout_heap_map_dma -> sg_alloc_table(table, 1, GFP_KERNEL); ion_buffer_alloc_dirty -> pages = buffer->sg_table->nents;
We want to support partial cache flush. Align to cache line, instead of PAGE_SIZE, for efficiency consideration. We have considered extended dirty bit, but looks like only align to PAGE_SIZE.
For experiment we modify ioctl ION_IOC_SYNC on armv7. And directly use dmac_map_area & dmac_flush_range with add from user space. However, we find dmac_map_area can not work with this addr from user space. In fact, it is mcr can not work with addr from user space, it would hung.
Let me summerize it. First, user space address is mapped. Then, flushing user space address is triggered. It's a workaround of fixing non-existed virtual address without fixing vmap() or any other solution. It's just a quick fix.
Zhangfei, I doubt that the issue may be caused by missing memory barrier. Flushing is using coprocessor instructions. It's a little different.
On Mon, Aug 27, 2012 at 1:46 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Mon, Aug 27, 2012 at 9:51 AM, zhangfei gao zhangfei.gao@gmail.com wrote:
Hi, All
We met question about dmac_map_area & dmac_flush_range from user addr. mcr would not return on armv7 processor.
Existing ion carveout heap does not support partial cache flush. Total cache will be flushed at all.
There is only one dirty bit for carveout heap, as well as sg_table->nents. drivers/gpu/ion/ion_carveout_heap.c ion_carveout_heap_map_dma -> sg_alloc_table(table, 1, GFP_KERNEL); ion_buffer_alloc_dirty -> pages = buffer->sg_table->nents;
We want to support partial cache flush. Align to cache line, instead of PAGE_SIZE, for efficiency consideration. We have considered extended dirty bit, but looks like only align to PAGE_SIZE.
For experiment we modify ioctl ION_IOC_SYNC on armv7. And directly use dmac_map_area & dmac_flush_range with add from user space. However, we find dmac_map_area can not work with this addr from user space. In fact, it is mcr can not work with addr from user space, it would hung.
Let me summerize it. First, user space address is mapped. Then, flushing user space address is triggered. It's a workaround of fixing non-existed virtual address without fixing vmap() or any other solution. It's just a quick fix.
Zhangfei, I doubt that the issue may be caused by missing memory barrier. Flushing is using coprocessor instructions. It's a little different.
Is there any limitation that dmac_map_area & dmac_flush_range supporting addr mapped from user space? And mcr can not return with user space addr. While __davt_svc -> page fault happen, even the page table has already been set up.
Thanks
On Mon, Aug 27, 2012 at 04:23:31PM +0800, zhangfei gao wrote:
On Mon, Aug 27, 2012 at 1:46 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
Let me summerize it. First, user space address is mapped. Then, flushing user space address is triggered. It's a workaround of fixing non-existed virtual address without fixing vmap() or any other solution. It's just a quick fix.
Zhangfei, I doubt that the issue may be caused by missing memory barrier. Flushing is using coprocessor instructions. It's a little different.
Is there any limitation that dmac_map_area & dmac_flush_range supporting addr mapped from user space?
They DEFINITELY DO NOT SUPPORT FLUSHING USER SPACE.
On Mon, Aug 27, 2012 at 4:29 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Aug 27, 2012 at 04:23:31PM +0800, zhangfei gao wrote:
On Mon, Aug 27, 2012 at 1:46 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
Let me summerize it. First, user space address is mapped. Then, flushing user space address is triggered. It's a workaround of fixing non-existed virtual address without fixing vmap() or any other solution. It's just a quick fix.
Zhangfei, I doubt that the issue may be caused by missing memory barrier. Flushing is using coprocessor instructions. It's a little different.
Is there any limitation that dmac_map_area & dmac_flush_range supporting addr mapped from user space?
They DEFINITELY DO NOT SUPPORT FLUSHING USER SPACE.
Thanks Russell
Any suggestion of flushing cache according to cache line, instead of PAGE_SIZE. In order to get specific area, we use addr from user directly, which may not be the PAGE start.
We have some usage case to flush cache according to cache line. CPU - cache - ddr - gpu 1. For correctness, driver only flush used size, if align to PAGE_SIZE, other area may be flushed by mistake. 2. for efficiency, cache line align will be perfered.
Thanks
On Mon, Aug 27, 2012 at 4:56 PM, zhangfei gao zhangfei.gao@gmail.com wrote:
Any suggestion of flushing cache according to cache line, instead of PAGE_SIZE. In order to get specific area, we use addr from user directly, which may not be the PAGE start.
We have some usage case to flush cache according to cache line. CPU - cache - ddr - gpu
- For correctness, driver only flush used size, if align to PAGE_SIZE,
other area may be flushed by mistake. 2. for efficiency, cache line align will be perfered.
I think that maybe flushing PAGE_SIZE is acceptable. ION/PMEM is always designed for large memory sharing. Maybe you can monitor the partial flushing user case, I doubt they're always flushing several pages, not several cachelines.
On Mon, Aug 27, 2012 at 4:29 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Aug 27, 2012 at 04:23:31PM +0800, zhangfei gao wrote:
On Mon, Aug 27, 2012 at 1:46 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
Let me summerize it. First, user space address is mapped. Then, flushing user space address is triggered. It's a workaround of fixing non-existed virtual address without fixing vmap() or any other solution. It's just a quick fix.
Zhangfei, I doubt that the issue may be caused by missing memory barrier. Flushing is using coprocessor instructions. It's a little different.
Is there any limitation that dmac_map_area & dmac_flush_range supporting addr mapped from user space?
They DEFINITELY DO NOT SUPPORT FLUSHING USER SPACE.
I agree that those API are defined to support kernel space address only. But the implementation is only flushing address by assembly code.
If we don't care the original goal of API, the problem should be caused by address mapping isn't set up yet. I doubt that the first step isn't fully finished. Since the second step is using coprocessor instruction, that's a shortcut. If we add delay such as memory barrier, the first step can be really finished before the second step running.
On Mon, Aug 27, 2012 at 09:43:10PM +0800, Haojian Zhuang wrote:
On Mon, Aug 27, 2012 at 4:29 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Aug 27, 2012 at 04:23:31PM +0800, zhangfei gao wrote:
On Mon, Aug 27, 2012 at 1:46 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
Let me summerize it. First, user space address is mapped. Then, flushing user space address is triggered. It's a workaround of fixing non-existed virtual address without fixing vmap() or any other solution. It's just a quick fix.
Zhangfei, I doubt that the issue may be caused by missing memory barrier. Flushing is using coprocessor instructions. It's a little different.
Is there any limitation that dmac_map_area & dmac_flush_range supporting addr mapped from user space?
They DEFINITELY DO NOT SUPPORT FLUSHING USER SPACE.
I agree that those API are defined to support kernel space address only. But the implementation is only flushing address by assembly code.
No. Have you looked at what the dma_map_page() and scatterlist stuff does? It all wants to deal with a 'struct page'. Have you looked at how L2 caches are handled? That wants to be able to use virt_to_phys and similar to convert addresses, which only work on the linear mapped region.
Have you thought about what happens when the user's page gets unmapped or swapped out (or even just aged) ? I don't think you've given any of this any consideration what so ever.
There's more here than just "oh we can just pass random address X into this function and hope that nothing goes wrong." It's really not that simple.
On Sun, Aug 26, 2012 at 6:51 PM, zhangfei gao zhangfei.gao@gmail.com wrote:
Hi, All
We met question about dmac_map_area & dmac_flush_range from user addr. mcr would not return on armv7 processor.
Existing ion carveout heap does not support partial cache flush. Total cache will be flushed at all.
There is only one dirty bit for carveout heap, as well as sg_table->nents. drivers/gpu/ion/ion_carveout_heap.c ion_carveout_heap_map_dma -> sg_alloc_table(table, 1, GFP_KERNEL); ion_buffer_alloc_dirty -> pages = buffer->sg_table->nents;
We want to support partial cache flush. Align to cache line, instead of PAGE_SIZE, for efficiency consideration. We have considered extended dirty bit, but looks like only align to PAGE_SIZE.
For experiment we modify ioctl ION_IOC_SYNC on armv7. And directly use dmac_map_area & dmac_flush_range with add from user space. However, we find dmac_map_area can not work with this addr from user space. In fact, it is mcr can not work with addr from user space, it would hung. Also, ion_vm_falut would happen twice. The first time is from __dabt_usr, when we access the mmaped buffer, it is fine. The second is from __davt_svc, it is caused by mcr, it is strange?
ION malloc carveout heap addr = user mmap user access addr, ion_vm_fault (__dabt_usr), build page table, and vm_insert_page. dmac_map_area & dmac_flush_range with addr -> ion_vm_fault (__davt_svc) mcr hung.
Not understand why ion_vm_fault happen twice, where page table has been build. Why mcr will hung with addr from user space.
Besides, no problem with ION on 3.0, which do not use ion_vm_fault.
Any suggestion?
Thanks
I can't think of any use case where you would want sub-page cache flushing. This isn't a feature I plan on supporting. Generically we should probably extend the dirty map so that you can have page wise flushing of the carveout allocations. Again that's something I can implement when I get a chance, or you can submit a patch to me that adds it. The code in ion_system_heap.c that does the same can serve as an example. The current implementation of the carveout head doesn't support cached mappings at all -- it should print an error if you try to create one.
Rebecca
I can't think of any use case where you would want sub-page cache flushing. This isn't a feature I plan on supporting. Generically we should probably extend the dirty map so that you can have page wise flushing of the carveout allocations. Again that's something I can implement when I get a chance, or you can submit a patch to me that adds it. The code in ion_system_heap.c that does the same can serve as an example. The current implementation of the carveout head doesn't support cached mappings at all -- it should print an error if you try to create one.
Rebecca
Thanks Rebecca,
Still not sure whether we need specific dma_data_direction direction. Currently DMA_BIDIRECTIONAL is used by default, and no interface could transfer direction.
Besides, which code base should be used if want to submit patch. How about common.git, android-3.4 branch?
Thanks
linaro-mm-sig@lists.linaro.org