- Linaro-mm-sig - lists.linaro.org

Re: [Linaro-mm-sig] [PATCH] RFC: dma-buf: userspace mmap support

by Alan Cox

> > dma-buf file descriptor. Userspace access to the buffer should be > > bracketed with DMA_BUF_IOCTL_{PREPARE,FINISH}_ACCESS ioctl calls to > > give the exporting driver a chance to deal with cache synchronization > > and such for cached userspace mappings without resorting to page There should be flags indicating if this is necessary. We don't want extra syscalls on hardware that doesn't need it. The other question is what info is needed as you may only want to poke a few pages out of cache and the prepare/finish on its own gives no info. > E.g. If another device was writing to the buffer, the prepare ioctl > could block until that device had finished accessing that buffer. How do you avoid deadlocks on this ? We need very clear ways to ensure things always complete in some form given multiple buffer owner/requestors and the fact this API has no "prepare-multiple-buffers" support. Alan

13 years, 9 months

3
2
0 0

[PATCH] dma-buf: pass flags into dma_buf_fd.

by Dave Airlie

From: Dave Airlie <airlied(a)redhat.com> We need to pass the flags into dma_buf_fd at this point, so the flags end up doing the right thing for O_CLOEXEC. Signed-off-by: Dave Airlie <airlied(a)redhat.com> --- drivers/base/dma-buf.c | 5 +++-- include/linux/dma-buf.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index c9a945f..3c8c023 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -107,17 +107,18 @@ EXPORT_SYMBOL_GPL(dma_buf_export); /** * dma_buf_fd - returns a file descriptor for the given dma_buf * @dmabuf: [in] pointer to dma_buf for which fd is required. + * @flags: [in] flags to give to fd * * On success, returns an associated 'fd'. Else, returns error. */ -int dma_buf_fd(struct dma_buf *dmabuf) +int dma_buf_fd(struct dma_buf *dmabuf, int flags) { int error, fd; if (!dmabuf || !dmabuf->file) return -EINVAL; - error = get_unused_fd(); + error = get_unused_fd_flags(flags); if (error < 0) return error; fd = error; diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index a885b26..891457a 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -117,7 +117,7 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *dmabuf_attach); struct dma_buf *dma_buf_export(void *priv, const struct dma_buf_ops *ops, size_t size, int flags); -int dma_buf_fd(struct dma_buf *dmabuf); +int dma_buf_fd(struct dma_buf *dmabuf, int flags); struct dma_buf *dma_buf_get(int fd); void dma_buf_put(struct dma_buf *dmabuf); -- 1.7.6

13 years, 9 months

3
2
0 0

[PATCH] dma-buf: add get_dma_buf()

by Rob Clark

From: Rob Clark <rob(a)ti.com> Works in a similar way to get_file(), and is needed in cases such as when the exporter needs to also keep a reference to the dmabuf (that is later released with a dma_buf_put()), and possibly other similar cases. Signed-off-by: Rob Clark <rob(a)ti.com> --- Minor update on original to add a missing #include include/linux/dma-buf.h | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 891457a..bc4203dc 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -30,6 +30,7 @@ #include <linux/scatterlist.h> #include <linux/list.h> #include <linux/dma-mapping.h> +#include <linux/fs.h> struct dma_buf; struct dma_buf_attachment; @@ -110,6 +111,20 @@ struct dma_buf_attachment { void *priv; }; +/** + * get_dma_buf - convenience wrapper for get_file. + * @dmabuf: [in] pointer to dma_buf + * + * Increments the reference count on the dma-buf, needed in case of drivers + * that either need to create additional references to the dmabuf on the + * kernel side. For example, an exporter that needs to keep a dmabuf ptr + * so that subsequent exports don't create a new dmabuf. + */ +static inline void get_dma_buf(struct dma_buf *dmabuf) +{ + get_file(dmabuf->file); +} + #ifdef CONFIG_DMA_SHARED_BUFFER struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, struct device *dev); -- 1.7.5.4

13 years, 9 months

2
1
0 0

Re: [Linaro-mm-sig] [PATCH] RFC: dma-buf: userspace mmap support

by Rob Clark

On Fri, Mar 16, 2012 at 12:24 PM, Tom Cooksey <tom.cooksey(a)arm.com> wrote: > >> From: Rob Clark <rob(a)ti.com> >> >> Enable optional userspace access to dma-buf buffers via mmap() on the >> dma-buf file descriptor. Userspace access to the buffer should be >> bracketed with DMA_BUF_IOCTL_{PREPARE,FINISH}_ACCESS ioctl calls to >> give the exporting driver a chance to deal with cache synchronization >> and such for cached userspace mappings without resorting to page >> faulting tricks. The reasoning behind this is that, while drm >> drivers tend to have all the mechanisms in place for dealing with >> page faulting tricks, other driver subsystems may not. And in >> addition, while page faulting tricks make userspace simpler, there >> are some associated overheads. > > Speaking for the ARM Mali T6xx driver point of view, this API looks > good for us. Our use-case for mmap is glReadPixels and > glTex[Sub]Image2D on buffers the driver has imported via dma_buf. In > the case of glReadPixels, the finish ioctl isn't strictly necessary > as the CPU won't have written to the buffer and so doesn't need > flushing. As such, we'd get an additional cache flush which isn't > really necessary. But hey, it's glReadPixels - it's supposed to be > slow. :-) > > I think requiring the finish ioctl in the API contract is a good > idea, even if the CPU has only done a ro access as it allows future > enhancements*. To "fix" the unnecessary flush in glReadPixels, I > think we'd like to keep the finish but see an "access type" > parameter added to prepare ioctl indicating if the access is ro or > rw to allow the cache flush in finish to be skipped if the access > was ro. As Rebecca says, a debug feature could even be added to > re-map the pages as ro in prepare(ro) to catch naughty accesses. I'd > also go as far as to say the debug feature should completely unmap > the pages after finish too. Though for us, both the access-type > parameter and debug features are "nice to haves" - we can make > progress with the code as it currently stands (assuming exporters > start using the API that is). Perhaps it isn't a bad idea to include access-type bitmask in the first version. It would help optimize a bit the cache operations. > Something which also came up when discussing internally is the topic > of mmap APIs of the importing device driver. For example, I believe > DRM has an mmap API on GEM buffer objects. If a new dma_buf import > ioctl was added to GEM (maybe the PRIME patches already add this), > how would GEM's bo mmap API work? My first thought is maybe we should just dis-allow this for now until we have a chance to see if there are any possible issues with an importer mmap'ing the buffer to userspace. We could possible have a helper dma_buf_mmap() fxn which in turn calls dmabuf ops->mmap() so the mmap'ing is actually performed by the exporter on behalf of the importer. > > * Future enhancements: The prepare/finish bracketing could be used > as part of a wider synchronization scheme with other devices. > E.g. If another device was writing to the buffer, the prepare ioctl > could block until that device had finished accessing that buffer. > In the same way, another device could be blocked from accessing that > buffer until the client process called finish. We have already > started playing with such a scheme in the T6xx driver stack we're > terming "kernel dependency system". In this scheme each buffer has a > FIFO of "buffer consumers" waiting to access a buffer. The idea > being that a "buffer consumer" is fairly abstract and could be any > device or userspace process participating in the synchronization > scheme. Examples would be GPU jobs, display controller "scan-out" > jobs, etc. > > So for example, a userspace application could dispatch a GPU > fragment shading job into the GPU's kernel driver which will write > to a KMS scanout buffer. The application then immediately issues a > drm_mode_crtc_page_flip ioctl on the display controller's DRM driver > to display the soon-to-be-rendered buffer. Inside the kernel, the > GPU driver adds the fragment job to the dma_buf's FIFO. As the FIFO > was empty, dma_buf calls into the GPU kernel driver to tell it it > "owns" access to the buffer and the GPU driver schedules the job to > run on the GPU. Upon receiving the drm_mode_crtc_page_flip ioctl, > the DRM/KMS driver adds a scan-out job to the buffer's FIFO. > However, the FIFO already has the GPU's fragment shading job in it > so nothing happens until the GPU job completes. When the GPU job > completes, the GPU driver calls into dma_buf to mark its job > complete. dma_buf then takes the next job in its FIFO which the KMS > driver's scanout job, calls into the KMS driver to schedule the > pageflip. The result? A buffer gets scanned out as soon as it has > finished being rendered without needing a round-trip to userspace. > Sure, there are easier ways to achieve that goal, but the idea is > that the mechanism can be used to synchronize access across multiple > devices, which makes it useful for lots of other use-cases too. > > > As I say, we have already implemented something which works as I > describe but where the buffers are abstract resources not linked to > dma_buf. I'd like to discuss the finer points of the mechanisms > further, but if it's looking like there's interest in this approach > we'll start re-writing the code we have to sit on-top of dma_buf > and posting it as RFCs to the various lists. The intention is to get > this to mainline, if mainline wants it. :-) I think we do need some sort of 'sync object' (which might really just be a 'synchronization queue' object) in the kernel. It probably shouldn't be built-in to dma-buf, but I expect we'd want the dma_buf struct to have a 'struct sync_queue *' (or whatever it ends up being called). The sync-queue seems like a reasonable approach for pure cpu-sw based synchronization. The only thing I'm not sure is how to also deal with hw that supports any sort of auto synchronization without cpu sw involvement. BR, -R > Personally, what I particularly like about this approach to > synchronization is that it doesn't require any interfaces to be > modified. I think/hope that makes it easier to port existing drivers > and sub-systems to take advantage of it. The buffer itself is the > synchronization object and interfaces already pass buffers around so > don't need modification. There are of course some limitations with > this approach, the main one we can think of being that it can't > really be used for A/V sync. It kinda assumes "jobs" in the FIFO > should be run as soon as the preceding job completes, which isn't > the case when streaming real-time video. Though nothing precludes > more explicit sync objects being used in conjunction with this > approach. > > > Cheers, > > Tom > > > > > > _______________________________________________ > dri-devel mailing list > dri-devel(a)lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel

13 years, 9 months

1
0
0 0

[PATCH] RFC: dma-buf: userspace mmap support

by Rob Clark

From: Rob Clark <rob(a)ti.com> Enable optional userspace access to dma-buf buffers via mmap() on the dma-buf file descriptor. Userspace access to the buffer should be bracketed with DMA_BUF_IOCTL_{PREPARE,FINISH}_ACCESS ioctl calls to give the exporting driver a chance to deal with cache synchronization and such for cached userspace mappings without resorting to page faulting tricks. The reasoning behind this is that, while drm drivers tend to have all the mechanisms in place for dealing with page faulting tricks, other driver subsystems may not. And in addition, while page faulting tricks make userspace simpler, there are some associated overheads. In all cases, the mmap() call is allowed to fail, and the associated dma_buf_ops are optional (mmap() will fail if at least the mmap() op is not implemented by the exporter, but in either case the {prepare,finish}_access() ops are optional). For now the prepare/finish access ioctls are kept simple with no argument, although there is possibility to add additional ioctls (or simply change the existing ioctls from _IO() to _IOW()) later to provide optimization to allow userspace to specify a region of interest. For a final patch, dma-buf.h would need to be split into what is exported to userspace, and what is kernel private, but I wanted to get feedback on the idea of requiring userspace to bracket access first (vs. limiting this to coherent mappings or exporters who play page faltings plus PTE shoot-down games) before I split the header which would cause conflicts with other pending dma-buf patches. So flame-on! --- drivers/base/dma-buf.c | 42 ++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 22 ++++++++++++++++++++++ 2 files changed, 64 insertions(+), 0 deletions(-) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index c9a945f..382b78a 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -30,6 +30,46 @@ static inline int is_dma_buf_file(struct file *); +static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct dma_buf *dmabuf; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + dmabuf = file->private_data; + + if (dmabuf->ops->mmap) + return dmabuf->ops->mmap(dmabuf, file, vma); + + return -ENODEV; +} + +static long dma_buf_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + struct dma_buf *dmabuf; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + dmabuf = file->private_data; + + switch (_IOC_NR(cmd)) { + case _IOC_NR(DMA_BUF_IOCTL_PREPARE_ACCESS): + if (dmabuf->ops->prepare_access) + return dmabuf->ops->prepare_access(dmabuf); + return 0; + case _IOC_NR(DMA_BUF_IOCTL_FINISH_ACCESS): + if (dmabuf->ops->finish_access) + return dmabuf->ops->finish_access(dmabuf); + return 0; + default: + return -EINVAL; + } +} + + static int dma_buf_release(struct inode *inode, struct file *file) { struct dma_buf *dmabuf; @@ -45,6 +85,8 @@ static int dma_buf_release(struct inode *inode, struct file *file) } static const struct file_operations dma_buf_fops = { + .mmap = dma_buf_mmap, + .unlocked_ioctl = dma_buf_ioctl, .release = dma_buf_release, }; diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index a885b26..cbdff81 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -34,6 +34,17 @@ struct dma_buf; struct dma_buf_attachment; +/* TODO: dma-buf.h should be the userspace visible header, and dma-buf-priv.h (?) + * the kernel internal header.. for now just stuff these here to avoid conflicting + * with other patches.. + * + * For now, no arg to keep things simple, but we could consider adding an + * optional region of interest later. + */ +#define DMA_BUF_IOCTL_PREPARE_ACCESS _IO('Z', 0) +#define DMA_BUF_IOCTL_FINISH_ACCESS _IO('Z', 1) + + /** * struct dma_buf_ops - operations possible on struct dma_buf * @attach: [optional] allows different devices to 'attach' themselves to the @@ -49,6 +60,13 @@ struct dma_buf_attachment; * @unmap_dma_buf: decreases usecount of buffer, might deallocate scatter * pages. * @release: release this buffer; to be called after the last dma_buf_put. + * @mmap: [optional, allowed to fail] operation called if userspace calls + * mmap() on the dmabuf fd. Note that userspace should use the + * DMA_BUF_PREPARE_ACCESS / DMA_BUF_FINISH_ACCESS ioctls before/after + * sw access to the buffer, to give the exporter an opportunity to + * deal with cache maintenance. + * @prepare_access: [optional] handler for PREPARE_ACCESS ioctl. + * @finish_access: [optional] handler for FINISH_ACCESS ioctl. */ struct dma_buf_ops { int (*attach)(struct dma_buf *, struct device *, @@ -72,6 +90,10 @@ struct dma_buf_ops { /* after final dma_buf_put() */ void (*release)(struct dma_buf *); + int (*mmap)(struct dma_buf *, struct file *, struct vm_area_struct *); + int (*prepare_access)(struct dma_buf *); + int (*finish_access)(struct dma_buf *); + }; /** -- 1.7.5.4

13 years, 9 months

6
7
0 0

expected userspace prime/dma-buf usage

by Dave Airlie

Just wondering how we expect userspace to use dma-buf/prime interfaces. Currently I see one driver in sharing the buffer with handle->fd, then passing the fd to the other driver and it using fd->handle, do we then expect the importing driver to close the fd? Dave.

13 years, 9 months

2
2
0 0

[PATCH v3 0/4] Add CMA heap for ION memory manager

by benjamin.gaignard＠stericsson.com

From: benjamin gaignard <benjamin.gaignard(a)linaro.org> The goal of those patches is to allow ION clients (drivers or userland applications) to use Contiguous Memory Allocator (CMA). To get more info about CMA: http://lists.linaro.org/pipermail/linaro-mm-sig/2012-February/001328.html patches version 3: - add a private field in ion_heap structure instead of expose ion_device structure to all heaps - ion_cma_heap is no more a platform driver - ion_cma_heap use ion_heap private field to store the device pointer and make the link with reserved CMA regions - provide ux500-ion driver and configuration file for snowball board to give an example of how use CMA heaps patches version 2: - fix comments done by Andy Green Benjamin Gaignard (1): fix ion_platform_data definition add private field in ion_heap structure add CMA heap add test/example driver for ux500 platform arch/arm/mach-ux500/board-mop500.c | 80 +++++++++++++++++++ drivers/gpu/ion/Kconfig | 6 ++ drivers/gpu/ion/Makefile | 2 + drivers/gpu/ion/cma/Makefile | 1 + drivers/gpu/ion/cma/ion_cma_heap.c | 126 ++++++++++++++++++++++++++++++ drivers/gpu/ion/cma/ion_cma_heap.h | 11 +++ drivers/gpu/ion/ion_priv.h | 2 + drivers/gpu/ion/ux500/Makefile | 1 + drivers/gpu/ion/ux500/ux500_ion.c | 147 ++++++++++++++++++++++++++++++++++++ include/linux/ion.h | 2 +- 10 files changed, 377 insertions(+), 1 deletions(-) create mode 100644 drivers/gpu/ion/cma/Makefile create mode 100644 drivers/gpu/ion/cma/ion_cma_heap.c create mode 100644 drivers/gpu/ion/cma/ion_cma_heap.h create mode 100644 drivers/gpu/ion/ux500/Makefile create mode 100644 drivers/gpu/ion/ux500/ux500_ion.c

13 years, 9 months

1
4
0 0

[PATCH 0/3] Add CMA heap for ION memory manager

by benjamin.gaignard＠stericsson.com

From: benjamin gaignard <benjamin.gaignard(a)linaro.org> The goal of those patches is to allow ION clients (drivers or userland applications) to use Contiguous Memory Allocator (CMA). To get more info about CMA: http://lists.linaro.org/pipermail/linaro-mm-sig/2012-February/001328.html patches version 2: fix comments done by Andy Green Benjamin Gaignard (3): make struct ion_device available for other heap fix ion_platform_data definition add CMA heap drivers/gpu/ion/Kconfig | 5 + drivers/gpu/ion/Makefile | 1 + drivers/gpu/ion/cma/Makefile | 1 + drivers/gpu/ion/cma/ion_cma_heap.c | 217 ++++++++++++++++++++++++++++++++++++ drivers/gpu/ion/ion.c | 20 ---- drivers/gpu/ion/ion_priv.h | 22 ++++ include/linux/ion.h | 2 +- 7 files changed, 247 insertions(+), 21 deletions(-) create mode 100644 drivers/gpu/ion/cma/Makefile create mode 100644 drivers/gpu/ion/cma/ion_cma_heap.c

13 years, 9 months

5
12
0 0

for-next inclusion request: dma-buf buffer sharing framework

by Sumit Semwal

Hi Stephen, May I request you to please add the dma-buf buffer sharing framework tree to linux-next? It is hosted here git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git branch: for-next -- Thanks and nest regards, Sumit Semwal.

13 years, 9 months

2
1
0 0

[PATCHv23 00/16] Contiguous Memory Allocator

by Marek Szyprowski

Hi, This is (yet another) quick update of CMA patches. I've rebased them onto next-20120222 tree from git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git and fixed the bug pointed by Aaro Koskinen. Best regards Marek Szyprowski Samsung Poland R&D Center Links to previous versions of the patchset: v22: <http://www.spinics.net/lists/linux-media/msg44370.html> v21: <http://www.spinics.net/lists/linux-media/msg44155.html> v20: <http://www.spinics.net/lists/linux-mm/msg29145.html> v19: <http://www.spinics.net/lists/linux-mm/msg29145.html> v18: <http://www.spinics.net/lists/linux-mm/msg28125.html> v17: <http://www.spinics.net/lists/arm-kernel/msg148499.html> v16: <http://www.spinics.net/lists/linux-mm/msg25066.html> v15: <http://www.spinics.net/lists/linux-mm/msg23365.html> v14: <http://www.spinics.net/lists/linux-media/msg36536.html> v13: (internal, intentionally not released) v12: <http://www.spinics.net/lists/linux-media/msg35674.html> v11: <http://www.spinics.net/lists/linux-mm/msg21868.html> v10: <http://www.spinics.net/lists/linux-mm/msg20761.html> v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787> v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855> v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010> v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573> v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986> v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669> Changelog: v23: 1. fixed bug spotted by Aaro Koskinen (incorrect check inside VM_BUG_ON) 2. rebased onto next-20120222 tree from git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git v22: 1. Fixed compilation break caused by missing fixup patch in v21 2. Fixed typos in the comments 3. Removed superfluous #include entries v21: 1. Fixed incorrect check which broke memory compaction code 2. Fixed hacky and racy min_free_kbytes handling 3. Added serialization patch to watermark calculation 4. Fixed typos here and there in the comments v20 and earlier - see previous patchsets. Patches in this patchset: Marek Szyprowski (6): mm: extract reclaim code from __alloc_pages_direct_reclaim() mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks drivers: add Contiguous Memory Allocator X86: integrate CMA with DMA-mapping subsystem ARM: integrate CMA with DMA-mapping subsystem ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Mel Gorman (1): mm: Serialize access to min_free_kbytes Michal Nazarewicz (9): mm: page_alloc: remove trailing whitespace mm: compaction: introduce isolate_migratepages_range() mm: compaction: introduce map_pages() mm: compaction: introduce isolate_freepages_range() mm: compaction: export some of the functions mm: page_alloc: introduce alloc_contig_range() mm: page_alloc: change fallbacks array handling mm: mmzone: MIGRATE_CMA migration type added mm: page_isolation: MIGRATE_CMA isolation functions added Documentation/kernel-parameters.txt | 9 + arch/Kconfig | 3 + arch/arm/Kconfig | 2 + arch/arm/include/asm/dma-contiguous.h | 15 ++ arch/arm/include/asm/mach/map.h | 1 + arch/arm/kernel/setup.c | 9 +- arch/arm/mm/dma-mapping.c | 369 ++++++++++++++++++++++++------ arch/arm/mm/init.c | 23 ++- arch/arm/mm/mm.h | 3 + arch/arm/mm/mmu.c | 31 ++- arch/arm/plat-s5p/dev-mfc.c | 51 +---- arch/x86/Kconfig | 1 + arch/x86/include/asm/dma-contiguous.h | 13 + arch/x86/include/asm/dma-mapping.h | 4 + arch/x86/kernel/pci-dma.c | 18 ++- arch/x86/kernel/pci-nommu.c | 8 +- arch/x86/kernel/setup.c | 2 + drivers/base/Kconfig | 89 +++++++ drivers/base/Makefile | 1 + drivers/base/dma-contiguous.c | 401 +++++++++++++++++++++++++++++++ include/asm-generic/dma-contiguous.h | 28 +++ include/linux/device.h | 4 + include/linux/dma-contiguous.h | 110 +++++++++ include/linux/gfp.h | 12 + include/linux/mmzone.h | 47 +++- include/linux/page-isolation.h | 18 +- mm/Kconfig | 2 +- mm/Makefile | 3 +- mm/compaction.c | 418 +++++++++++++++++++++------------ mm/internal.h | 33 +++ mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 6 +- mm/page_alloc.c | 409 ++++++++++++++++++++++++++++---- mm/page_isolation.c | 15 +- mm/vmstat.c | 3 + 35 files changed, 1790 insertions(+), 373 deletions(-) create mode 100644 arch/arm/include/asm/dma-contiguous.h create mode 100644 arch/x86/include/asm/dma-contiguous.h create mode 100644 drivers/base/dma-contiguous.c create mode 100644 include/asm-generic/dma-contiguous.h create mode 100644 include/linux/dma-contiguous.h -- 1.7.1.569.g6f426

13 years, 9 months

4
24
0 0

Test application for DMABUF sharing between V4L2 and DRM

by Tomasz Stanislawski

Hi Everyone, This email contains a test application showing DMABUF sharing between DRM/KMS display and camera capture node. It show simple camera preview on LCD display. The similar application showing DMABUF sharing between two V4L devices is available at link: http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/4379… The program is written in C99 and it was tested using Exynos/DRM and FIMC capture for M5MOLS and S5K6AAFX sensors on UniversalC210 board. This application shows how buffer sharing between V4L2/DRM may look like. Please let me know if/where I use DRM/V4L2 incorrectly. The application was tested against 3.3-rc5 kernel with patches: http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819 [redesign of DMA mapping] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/4379… [support for dma_get_pages, PoC generic API for transforming DMA object into list of pages] http://thread.gmane.org/gmane.comp.video.dri.devel/65583/focus=65703 [DRM prime support] http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exy… [DRM prime support for Exynos DRM] http://thread.gmane.org/gmane.comp.video.dri.devel/65992 [fix to DRM prime in Exynos DRM] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/4296… [support for DMABUF importing in V4L2] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/45394 [integrate V4L2 with DMABUF] Regards, Tomasz Stanislawski --- #include <errno.h> #include <fcntl.h> #include <linux/videodev2.h> #include <math.h> #include <poll.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <xf86drm.h> #include <xf86drmMode.h> #include <exynos_drm.h> #define ERRSTR strerror(errno) #define BYE_ON(cond, ...) \ do { \ if (cond) { \ int errsv = errno; \ fprintf(stderr, "ERROR(%s:%d) : ", \ __FILE__, __LINE__); \ errno = errsv; \ fprintf(stderr, __VA_ARGS__); \ abort(); \ } \ } while(0) static inline int warn(const char *file, int line, const char *fmt, ...) { int errsv = errno; va_list va; va_start(va, fmt); fprintf(stderr, "WARN(%s:%d): ", file, line); vfprintf(stderr, fmt, va); va_end(va); errno = errsv; return 1; } #define WARN_ON(cond, ...) \ ((cond) ? warn(__FILE__, __LINE__, __VA_ARGS__) : 0) struct setup { char module[32]; uint32_t conId; uint32_t crtId; char modestr[32]; char video[32]; unsigned int w, h; unsigned int use_wh : 1; unsigned int in_fourcc; unsigned int out_fourcc; unsigned int buffer_count; unsigned int use_crop : 1; unsigned int use_compose : 1; struct v4l2_rect crop; struct v4l2_rect compose; }; struct buffer { unsigned int bo_handle; unsigned int fb_handle; int dbuf_fd; }; struct stream { int v4lfd; int current_buffer; int buffer_count; struct buffer *buffer; } stream; static void usage(char *name) { fprintf(stderr, "usage: %s [-Moisth]\n", name); fprintf(stderr, "\t-M <drm-module>\tset DRM module\n"); fprintf(stderr, "\t-o <connector_id>:<crtc_id>:<mode>\tset a mode\n"); fprintf(stderr, "\t-i <video-node>\tset video node like /dev/video*\n"); fprintf(stderr, "\t-S <width,height>\tset input resolution\n"); fprintf(stderr, "\t-f <fourcc>\tset input format using 4cc\n"); fprintf(stderr, "\t-F <fourcc>\tset output format using 4cc\n"); fprintf(stderr, "\t-s <width,height>@<left,top>\tset crop area\n"); fprintf(stderr, "\t-t <width,height>@<left,top>\tset compose area\n"); fprintf(stderr, "\t-b buffer_count\tset number of buffers\n"); fprintf(stderr, "\t-h\tshow this help\n"); fprintf(stderr, "\n\tDefault is to dump all info.\n"); } static inline int parse_rect(char *s, struct v4l2_rect *r) { return sscanf(s, "%d,%d@%d,%d", &r->width, &r->height, &r->top, &r->left) != 4; } static int parse_args(int argc, char *argv[], struct setup *s) { if (argc <= 1) usage(argv[0]); int c, ret; memset(s, 0, sizeof(*s)); while ((c = getopt(argc, argv, "M:o:i:S:f:F:s:t:b:h")) != -1) { switch (c) { case 'M': strncpy(s->module, optarg, 31); break; case 'o': ret = sscanf(optarg, "%u:%u:%31s", &s->conId, &s->crtId, s->modestr); if (WARN_ON(ret != 3, "incorrect mode description\n")) return -1; break; case 'i': strncpy(s->video, optarg, 31); break; case 'S': ret = sscanf(optarg, "%u,%u", &s->w, &s->h); if (WARN_ON(ret != 2, "incorrect input size\n")) return -1; s->use_wh = 1; break; case 'f': if (WARN_ON(strlen(optarg) != 4, "invalid fourcc\n")) return -1; s->in_fourcc = ((unsigned)optarg[0] << 0) | ((unsigned)optarg[1] << 8) | ((unsigned)optarg[2] << 16) | ((unsigned)optarg[3] << 24); break; case 'F': if (WARN_ON(strlen(optarg) != 4, "invalid fourcc\n")) return -1; s->out_fourcc = ((unsigned)optarg[0] << 0) | ((unsigned)optarg[1] << 8) | ((unsigned)optarg[2] << 16) | ((unsigned)optarg[3] << 24); break; case 's': ret = parse_rect(optarg, &s->crop); if (WARN_ON(ret, "incorrect crop area\n")) return -1; s->use_crop = 1; break; case 't': ret = parse_rect(optarg, &s->compose); if (WARN_ON(ret, "incorrect compose area\n")) return -1; s->use_compose = 1; break; case 'b': ret = sscanf(optarg, "%u", &s->buffer_count); if (WARN_ON(ret != 1, "incorrect buffer count\n")) return -1; break; case '?': case 'h': usage(argv[0]); return -1; } } return 0; } static int buffer_create(struct buffer *b, int drmfd, struct setup *s, uint64_t size, uint32_t pitch) { int ret = strncmp(s->module, "exynos", 6); if (WARN_ON(ret, "drm: only exynos GEM is supported\n")) return -1; struct drm_exynos_gem_create gem; struct drm_gem_close gem_close; memset(&gem, 0, sizeof gem); gem.size = size; ret = ioctl(drmfd, DRM_IOCTL_EXYNOS_GEM_CREATE, &gem); if (WARN_ON(ret, "EXYNOS_GEM_CREATE failed: %s\n", ERRSTR)) return -1; b->bo_handle = gem.handle; struct drm_prime_handle prime; memset(&prime, 0, sizeof prime); prime.handle = b->bo_handle; ret = ioctl(drmfd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &prime); if (WARN_ON(ret, "PRIME_HANDLE_TO_FD failed: %s\n", ERRSTR)) goto fail_gem; printf("dbuf_fd = %d\n", prime.fd); b->dbuf_fd = prime.fd; uint32_t offsets[4] = { 0 }; uint32_t pitches[4] = { pitch }; uint32_t bo_handles[4] = { b->bo_handle }; unsigned int fourcc = s->out_fourcc; if (!fourcc) fourcc = s->in_fourcc; ret = drmModeAddFB2(drmfd, s->w, s->h, fourcc, bo_handles, pitches, offsets, &b->fb_handle, 0); if (WARN_ON(ret, "drmModeAddFB2 failed: %s\n", ERRSTR)) goto fail_prime; return 0; fail_prime: close(b->dbuf_fd); fail_gem: memset(&gem_close, 0, sizeof gem_close); gem_close.handle = b->bo_handle, ret = ioctl(drmfd, DRM_IOCTL_GEM_CLOSE, gem_close); WARN_ON(ret, "GEM_CLOSE failed: %s\n", ERRSTR); return -1; } static int find_mode(drmModeModeInfo *m, int drmfd, struct setup *s, uint32_t *con) { int ret = -1; drmModeRes *res = drmModeGetResources(drmfd); if (WARN_ON(!res, "drmModeGetResources failed: %s\n", ERRSTR)) return -1; if (WARN_ON(res->count_crtcs <= 0, "drm: no crts\n")) goto fail_res; if (WARN_ON(res->count_connectors <= 0, "drm: no connectors\n")) goto fail_res; if (WARN_ON(s->conId >= res->count_connectors, "connector %d " "is not supported\n", s->conId)) goto fail_res; drmModeConnector *c; c = drmModeGetConnector(drmfd, res->connectors[s->conId]); if (WARN_ON(!c, "drmModeGetConnector failed: %s\n", ERRSTR)) goto fail_res; if (WARN_ON(!c->count_modes, "connector supports no mode\n")) goto fail_conn; drmModeModeInfo *found = NULL; for (int i = 0; i < c->count_modes; ++i) if (strcmp(c->modes[i].name, s->modestr) == 0) found = &c->modes[i]; if (WARN_ON(!found, "mode %s not supported\n", s->modestr)) { fprintf(stderr, "Valid modes:"); for (int i = 0; i < c->count_modes; ++i) fprintf(stderr, " %s", c->modes[i].name); fprintf(stderr, "\n"); goto fail_conn; } memcpy(m, found, sizeof *found); if (con) *con = c->connector_id; ret = 0; fail_conn: drmModeFreeConnector(c); fail_res: drmModeFreeResources(res); return ret; } static void page_flip_handler(int fd, unsigned int frame, unsigned int sec, unsigned int usec, void *data) { int index = stream.current_buffer; struct v4l2_buffer buf; struct v4l2_plane plane; int ret; stream.current_buffer = (int)data; if (index < 0) return; memset(&buf, 0, sizeof buf); memset(&plane, 0, sizeof plane); buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; buf.memory = V4L2_MEMORY_DMABUF; buf.index = index; buf.m.planes = &plane; buf.length = 1; plane.m.fd = stream.buffer[index].dbuf_fd; ret = ioctl(stream.v4lfd, VIDIOC_QBUF, &buf); BYE_ON(ret, "VIDIOC_QBUF(index = %d) failed: %s\n", index, ERRSTR); } int main(int argc, char *argv[]) { int ret; struct setup s; ret = parse_args(argc, argv, &s); BYE_ON(ret, "failed to parse arguments\n"); BYE_ON(s.module[0] == 0, "DRM module is missing\n"); BYE_ON(s.video[0] == 0, "video node is missing\n"); int drmfd = drmOpen(s.module, NULL); BYE_ON(drmfd < 0, "drmOpen(%s) failed: %s\n", s.module, ERRSTR); int v4lfd = open(s.video, O_RDWR); BYE_ON(v4lfd < 0, "failed to open %s: %s\n", s.video, ERRSTR); struct v4l2_capability caps; memset(&caps, 0, sizeof caps); ret = ioctl(v4lfd, VIDIOC_QUERYCAP, &caps); BYE_ON(ret, "VIDIOC_QUERYCAP failed: %s\n", ERRSTR); /* TODO: add single plane support */ BYE_ON(~caps.capabilities & V4L2_CAP_VIDEO_CAPTURE_MPLANE, "video: multiplanar capture is not supported\n"); struct v4l2_format fmt; memset(&fmt, 0, sizeof fmt); fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; ret = ioctl(v4lfd, VIDIOC_G_FMT, &fmt); BYE_ON(ret < 0, "VIDIOC_G_FMT failed: %s\n", ERRSTR); printf("G_FMT(start): width = %u, height = %u, 4cc = %.4s\n", fmt.fmt.pix_mp.width, fmt.fmt.pix_mp.height, (char*)&fmt.fmt.pix_mp.pixelformat); if (s.use_wh) { fmt.fmt.pix_mp.width = s.w; fmt.fmt.pix_mp.height = s.h; } if (s.in_fourcc) fmt.fmt.pix_mp.pixelformat = s.in_fourcc; ret = ioctl(v4lfd, VIDIOC_S_FMT, &fmt); BYE_ON(ret < 0, "VIDIOC_S_FMT failed: %s\n", ERRSTR); ret = ioctl(v4lfd, VIDIOC_G_FMT, &fmt); BYE_ON(ret < 0, "VIDIOC_G_FMT failed: %s\n", ERRSTR); printf("G_FMT(final): width = %u, height = %u, 4cc = %.4s\n", fmt.fmt.pix_mp.width, fmt.fmt.pix_mp.height, (char*)&fmt.fmt.pix_mp.pixelformat); BYE_ON(fmt.fmt.pix_mp.num_planes > 1, "multiplanar formats are not supported\n"); struct v4l2_requestbuffers rqbufs; memset(&rqbufs, 0, sizeof(rqbufs)); rqbufs.count = s.buffer_count; rqbufs.type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; rqbufs.memory = V4L2_MEMORY_DMABUF; ret = ioctl(v4lfd, VIDIOC_REQBUFS, &rqbufs); BYE_ON(ret < 0, "VIDIOC_REQBUFS failed: %s\n", ERRSTR); BYE_ON(rqbufs.count < s.buffer_count, "video node allocated only " "%u of %u buffers\n", rqbufs.count, s.buffer_count); s.in_fourcc = fmt.fmt.pix_mp.pixelformat; s.w = fmt.fmt.pix_mp.width; s.h = fmt.fmt.pix_mp.height; /* TODO: add support for multiplanar formats */ struct buffer buffer[s.buffer_count]; uint64_t size = fmt.fmt.pix_mp.plane_fmt[0].sizeimage; uint32_t pitch = fmt.fmt.pix_mp.plane_fmt[0].bytesperline; printf("size = %llu pitch = %u\n", size, pitch); for (int i = 0; i < s.buffer_count; ++i) { ret = buffer_create(&buffer[i], drmfd, &s, size, pitch); BYE_ON(ret, "failed to create buffer%d\n", i); } printf("buffers ready\n"); drmModeModeInfo drmmode; uint32_t con; ret = find_mode(&drmmode, drmfd, &s, &con); BYE_ON(ret, "failed to find valid mode\n"); ret = drmModeSetCrtc(drmfd, s.crtId, buffer[0].fb_handle, 0, 0, &con, 1, &drmmode); BYE_ON(ret, "drmModeSetCrtc failed: %s\n", ERRSTR); /* enqueueing first buffer to DRM */ ret = drmModePageFlip(drmfd, s.crtId, buffer[0].fb_handle, DRM_MODE_PAGE_FLIP_EVENT, 0); BYE_ON(ret, "drmModePageFlip failed: %s\n", ERRSTR); for (int i = 1; i < s.buffer_count; ++i) { struct v4l2_plane plane; memset(&plane, 0, sizeof plane); struct v4l2_buffer buf; memset(&buf, 0, sizeof buf); buf.index = i; buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; buf.memory = V4L2_MEMORY_DMABUF; buf.m.planes = &plane; buf.length = 1; plane.m.fd = buffer[i].dbuf_fd; ret = ioctl(v4lfd, VIDIOC_QBUF, &buf); BYE_ON(ret < 0, "VIDIOC_QBUF for buffer %d failed: %s\n", buf.index, ERRSTR); } int type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; ret = ioctl(v4lfd, VIDIOC_STREAMON, &type); BYE_ON(ret < 0, "STREAMON failed: %s\n", ERRSTR); struct pollfd fds[] = { { .fd = v4lfd, .events = POLLIN }, { .fd = drmfd, .events = POLLIN }, }; /* buffer currently used by drm */ stream.v4lfd = v4lfd; stream.current_buffer = -1; stream.buffer = buffer; while ((ret = poll(fds, 2, 5000)) > 0) { if (fds[0].revents & POLLIN) { struct v4l2_buffer buf; memset(&buf, 0, sizeof buf); /* dequeue buffer */ buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; buf.memory = V4L2_MEMORY_DMABUF; ret = ioctl(v4lfd, VIDIOC_DQBUF, &buf); BYE_ON(ret, "VIDIOC_DQBUF failed: %s\n", ERRSTR); ret = drmModePageFlip(drmfd, s.crtId, buffer[buf.index].fb_handle, DRM_MODE_PAGE_FLIP_EVENT, (void*)buf.index); BYE_ON(ret, "drmModePageFlip failed: %s\n", ERRSTR); } if (fds[1].revents & POLLIN) { drmEventContext evctx; memset(&evctx, 0, sizeof evctx); evctx.version = DRM_EVENT_CONTEXT_VERSION; evctx.page_flip_handler = page_flip_handler; ret = drmHandleEvent(drmfd, &evctx); BYE_ON(ret, "drmHandleEvent failed: %s\n", ERRSTR); } } return 0; }

13 years, 9 months

1
0
0 0

[PATCH 0/3] [RFC] kernel cpu access support for dma_buf

by Daniel Vetter

Hi all, This series here implements an interface to enable cpu access from the kernel context to dma_buf objects. The main design goal of this interface proposal is to enable buffer objects that reside in highmem. Comments, flames, ideas and questions highly welcome. Althouhg I might be a bit slow in responding - I'm on conferences and vacation the next 2 weeks. Cheers, Daniel Daniel Vetter (3): dma-buf: don't hold the mutex around map/unmap calls dma-buf: add support for kernel cpu access dma_buf: Add documentation for the new cpu access support Documentation/dma-buf-sharing.txt | 102 +++++++++++++++++++++++++++++- drivers/base/dma-buf.c | 124 +++++++++++++++++++++++++++++++++++- include/linux/dma-buf.h | 62 ++++++++++++++++++- 3 files changed, 280 insertions(+), 8 deletions(-) -- 1.7.7.5

13 years, 9 months

7
14
0 0

dma-buf feature tree: working model

by Sumit Semwal

Hi all, Since the inclusion of dma-buf buffer sharing framework in 3.3 (thanks to Dave Airlie primarily), I have been volunteered to be its maintainer. Obviously there is a need for some simple rules about the dma-buf feature tree, so here we are: - there will be a 'for-next' branch for (N+1), which will open around -Nrc1, and close about 1-2 weeks before the (N+1)merge opens. - there will be a 'fixes' branch, which will take fixes after the for-next pull request is sent upstream. - after -rc2, regression fixes only. - after -rc4/5, only revert and disable patches. The real fix should then be targeted at for-next. - to stop me from pushing useless stuff, I will merge my own patches only after sufficient review on our mailing lists. If you see me breaking this rule, please shout out at me _publicly_ at the top of your voice. Being a 'first-time-maintainer', I am very willing to learn on-the-job, though I might still take cover under the 'first-time-maintainer' umbrella [for sometime :)] for any stupid acts I might commit. The tree resides at: git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git At present, the mailing lists are: linux-media(a)vger.kernel.org, dri-devel(a)lists.freedesktop.org, linaro-mm-sig(a)lists.linaro.org, in addition to lkml. Comments, flames and suggestions highly welcome. (I have been 'influenced' quite a bit from Daniel Vetter's model for the drm/i915 -next tree [thank you, DanVet!], but any errors/omissions are entirely mine.) Thanks and regards, ~Sumit.

13 years, 9 months

1
0
0 0

[PATCH 0/7] RFCv3 VGEM Prime (dma-buf)

by Ben Widawsky

I'm going to be off doing other things for the next couple of weeks, so I'm dropping these now to give it a nice soak while I'm gone. Dave/Daniel: if you could look these over and tell me if the general direction seems good. Ajax: anything you missing in the basic vgem stuff? Since the last time: Squashed down the original vgem patches Use dumb_bo functions/ditched VGEM ioctls Hooked up prime import and export support On the prime side, the major difference from what Dave has done before is a per driver hash of the previously used dma bufs/gem objects. The prime stuff is of particularly low quality at this point, like I said, trying to get something out before I disappear for a while. So please don't yell at me about obvious bugs :). After getting feedback on what I have now, I will incorporate Dave's earlier work on i915 prime, and get some better test cases going. On my todos: Ascii chart of dmabuf/drm object life cycle hashsify the per file list i915 per driver hash vgem-i915 and vice versa test cases As before, the very basic tools are here: git://people.freedesktop.org/~bwidawsk/vgem-gpu-tools Once we get cpu maps that I think Daniel wants to work on, I can even do better tests with just VGEM. Adam Jackson (1): drm/vgem: virtual GEM provider Ben Widawsky (5): drm: DRM_DEBUG_PRIME drm: per device prime dma buf hash drm/vgem: prime export support drm/vgem: import support drm: actually enable PRIME Dave Airlie (1): drm: base prime support drivers/gpu/drm/Kconfig | 9 + drivers/gpu/drm/Makefile | 3 +- drivers/gpu/drm/drm_drv.c | 3 + drivers/gpu/drm/drm_gem.c | 4 + drivers/gpu/drm/drm_prime.c | 172 +++++++++++++++ drivers/gpu/drm/drm_stub.c | 8 + drivers/gpu/drm/vgem/Makefile | 4 + drivers/gpu/drm/vgem/vgem_dma_buf.c | 248 ++++++++++++++++++++++ drivers/gpu/drm/vgem/vgem_drv.c | 389 +++++++++++++++++++++++++++++++++++ drivers/gpu/drm/vgem/vgem_drv.h | 61 ++++++ include/drm/drm.h | 10 +- include/drm/drmP.h | 55 +++++ 12 files changed, 964 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/drm/drm_prime.c create mode 100644 drivers/gpu/drm/vgem/Makefile create mode 100644 drivers/gpu/drm/vgem/vgem_dma_buf.c create mode 100644 drivers/gpu/drm/vgem/vgem_drv.c create mode 100644 drivers/gpu/drm/vgem/vgem_drv.h -- 1.7.9.1

13 years, 9 months

6
16
0 0

[PATCH 1/1] dma-mapping: Introduce arm_iommu_iova functions

by Hiroshi Doyu

From: Hiroshi DOYU <hdoyu(a)nvidia.com> Enable to allocate iova area with a specified address. Signed-off-by: Hiroshi DOYU <hdoyu(a)nvidia.com> --- arch/arm/include/asm/dma-iommu.h | 9 ++++ arch/arm/mm/dma-mapping.c | 98 ++++++++++++++++++++++++++++++++++---- 2 files changed, 98 insertions(+), 9 deletions(-) diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h index 799b094..8bfaeef 100644 --- a/arch/arm/include/asm/dma-iommu.h +++ b/arch/arm/include/asm/dma-iommu.h @@ -30,5 +30,14 @@ void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping); int arm_iommu_attach_device(struct device *dev, struct dma_iommu_mapping *mapping); +dma_addr_t arm_iommu_alloc_iova(struct device *dev, dma_addr_t iova, + size_t size); + +void arm_iommu_free_iova(struct device *dev, dma_addr_t addr, size_t size); + +size_t arm_iommu_iova_avail(struct device *dev); + +size_t arm_iommu_iova_max_free(struct device *dev); + #endif /* __KERNEL__ */ #endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 25baa16..f1af7a5 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -749,12 +749,61 @@ fs_initcall(dma_debug_do_init); /* IOMMU */ -static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping, - size_t size) +size_t arm_iommu_iova_avail(struct device *dev) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + unsigned long flags; + size_t bytes = 0; + + spin_lock_irqsave(&mapping->lock, flags); + while (1) { + unsigned long start = 0, end; + + start = bitmap_find_next_zero_area(mapping->bitmap, + mapping->bits, start, 1, 0); + if (start > mapping->bits) + break; + + end = find_next_bit(mapping->bitmap, mapping->bits, start); + bytes += end - start; + start = end; + } + spin_unlock_irqrestore(&mapping->lock, flags); + return bytes; +} +EXPORT_SYMBOL_GPL(arm_iommu_iova_avail); + +size_t arm_iommu_iova_max_free(struct device *dev) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + unsigned long flags; + size_t max_free = 0; + unsigned long start = 0; + + spin_lock_irqsave(&mapping->lock, flags); + while (1) { + unsigned long end; + + start = bitmap_find_next_zero_area(mapping->bitmap, + mapping->bits, start, 1, 0); + if (start > mapping->bits) + break; + + end = find_next_bit(mapping->bitmap, mapping->bits, start); + max_free = max_t(size_t, max_free, end - start); + start = end; + } + spin_unlock_irqrestore(&mapping->lock, flags); + return max_free << PAGE_SHIFT; +} +EXPORT_SYMBOL_GPL(arm_iommu_iova_max_free); + +static dma_addr_t __alloc_iova_addr(struct dma_iommu_mapping *mapping, + dma_addr_t iova, size_t size) { unsigned int order = get_order(size); unsigned int align = 0; - unsigned int count, start; + unsigned int count, orig, start; unsigned long flags; count = ((PAGE_ALIGN(size) >> PAGE_SHIFT) + @@ -764,18 +813,41 @@ static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping, align = (1 << (order - mapping->order)) - 1; spin_lock_irqsave(&mapping->lock, flags); - start = bitmap_find_next_zero_area(mapping->bitmap, mapping->bits, 0, - count, align); - if (start > mapping->bits) { - spin_unlock_irqrestore(&mapping->lock, flags); - return ~0; - } + + orig = iova ? (iova >> (mapping->order + PAGE_SHIFT)) : 0; + + start = bitmap_find_next_zero_area(mapping->bitmap, mapping->bits, + orig, count, align); + if (start > mapping->bits) + goto not_found; + + if (iova && (orig != start)) + goto not_found; bitmap_set(mapping->bitmap, start, count); spin_unlock_irqrestore(&mapping->lock, flags); return mapping->base + (start << (mapping->order + PAGE_SHIFT)); + +not_found: + spin_unlock_irqrestore(&mapping->lock, flags); + return ~0; +} + +static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping, + size_t size) +{ + return __alloc_iova_addr(mapping, 0, size); +} + +dma_addr_t arm_iommu_alloc_iova(struct device *dev, dma_addr_t iova, + size_t size) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + + return __alloc_iova_addr(mapping, iova, size); } +EXPORT_SYMBOL_GPL(arm_iommu_alloc_iova); static inline void __free_iova(struct dma_iommu_mapping *mapping, dma_addr_t addr, size_t size) @@ -791,6 +863,14 @@ static inline void __free_iova(struct dma_iommu_mapping *mapping, spin_unlock_irqrestore(&mapping->lock, flags); } +void arm_iommu_free_iova(struct device *dev, dma_addr_t addr, size_t size) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + + __free_iova(mapping, addr, size); +} +EXPORT_SYMBOL_GPL(arm_iommu_free_iova); + static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) { struct page **pages; -- 1.7.5.4

13 years, 9 months

1
0
0 0

[PATCHv6 0/7] ARM: DMA-mapping framework redesign

by Marek Szyprowski

Hello, This is another update on my works DMA-mapping framework redesign for ARM architecture. It includes a few minor cleanup and fixes since the last version posted by the end of December 2011. This patch series is now based on the generic, cross-arch dma-mapping redesign patches posted in the "[PATCH 00/14] DMA-mapping framework redesign preparation" thread: http://www.spinics.net/lists/linux-sh/msg09777.html All patches have been now rebased onto v3.3-rc2 kernel. All the code has been tested on Samsung Exynos4 'UniversalC210' board with IOMMU driver posted by KyongHo Cho. History of the development: v1: (initial version of the DMA-mapping redesign patches): http://www.spinics.net/lists/linux-mm/msg21241.html v2: http://lists.linaro.org/pipermail/linaro-mm-sig/2011-September/000571.html http://lists.linaro.org/pipermail/linaro-mm-sig/2011-September/000577.html v3: http://www.spinics.net/lists/linux-mm/msg25490.html v4 and v5: http://www.spinics.net/lists/arm-kernel/msg151147.html http://www.spinics.net/lists/arm-kernel/msg154889.html Best regards -- Marek Szyprowski Samsung Poland R&D Center Patch summary: Marek Szyprowski (7): ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops ARM: dma-mapping: use asm-generic/dma-mapping-common.h ARM: dma-mapping: implement dma sg methods on top of any generic dma ops ARM: dma-mapping: move all dma bounce code to separate dma ops structure ARM: dma-mapping: remove redundant code and cleanup ARM: dma-mapping: use alloc, mmap, free from dma_ops ARM: dma-mapping: add support for IOMMU mapper arch/arm/Kconfig | 9 + arch/arm/common/dmabounce.c | 78 +++- arch/arm/include/asm/device.h | 4 + arch/arm/include/asm/dma-iommu.h | 34 ++ arch/arm/include/asm/dma-mapping.h | 404 +++++------------ arch/arm/mm/dma-mapping.c | 897 ++++++++++++++++++++++++++++++------ arch/arm/mm/vmregion.h | 2 +- 7 files changed, 980 insertions(+), 448 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h -- 1.7.1.569.g6f426

13 years, 9 months

4
19
0 0

[PATCHv22 00/16] Contiguous Memory Allocator

by Marek Szyprowski

Hi, This is yet another update of the CMA patches. I really promise this is the last one. Previous version had been posted in a real hurry (before leaving the office for ELC trip) and I lost an important fixup patch in the final rebase. Best regards Marek Szyprowski Samsung Poland R&D Center Links to previous versions of the patchset: v21: <http://www.spinics.net/lists/linux-media/msg44155.html> v20: <http://www.spinics.net/lists/linux-mm/msg29145.html> v19: <http://www.spinics.net/lists/linux-mm/msg29145.html> v18: <http://www.spinics.net/lists/linux-mm/msg28125.html> v17: <http://www.spinics.net/lists/arm-kernel/msg148499.html> v16: <http://www.spinics.net/lists/linux-mm/msg25066.html> v15: <http://www.spinics.net/lists/linux-mm/msg23365.html> v14: <http://www.spinics.net/lists/linux-media/msg36536.html> v13: (internal, intentionally not released) v12: <http://www.spinics.net/lists/linux-media/msg35674.html> v11: <http://www.spinics.net/lists/linux-mm/msg21868.html> v10: <http://www.spinics.net/lists/linux-mm/msg20761.html> v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787> v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855> v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010> v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573> v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986> v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669> Changelog: v22: 1. Fixed compilation break caused by missing fixup patch in v21 2. Fixed typos in the comments 3. Removed superfluous #include entries v21: 1. Fixed incorrect check which broke memory compaction code 2. Fixed hacky and racy min_free_kbytes handling 3. Added serialization patch to watermark calculation 4. Fixed typos here and there in the comments v20 and earlier - see previous patchsets. Patches in this patchset: Marek Szyprowski (6): mm: extract reclaim code from __alloc_pages_direct_reclaim() mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks drivers: add Contiguous Memory Allocator X86: integrate CMA with DMA-mapping subsystem ARM: integrate CMA with DMA-mapping subsystem ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Mel Gorman (1): mm: Serialize access to min_free_kbytes Michal Nazarewicz (9): mm: page_alloc: remove trailing whitespace mm: compaction: introduce isolate_migratepages_range() mm: compaction: introduce map_pages() mm: compaction: introduce isolate_freepages_range() mm: compaction: export some of the functions mm: page_alloc: introduce alloc_contig_range() mm: page_alloc: change fallbacks array handling mm: mmzone: MIGRATE_CMA migration type added mm: page_isolation: MIGRATE_CMA isolation functions added Documentation/kernel-parameters.txt | 9 + arch/Kconfig | 3 + arch/arm/Kconfig | 2 + arch/arm/include/asm/dma-contiguous.h | 15 ++ arch/arm/include/asm/mach/map.h | 1 + arch/arm/kernel/setup.c | 9 +- arch/arm/mm/dma-mapping.c | 368 ++++++++++++++++++++++++------ arch/arm/mm/init.c | 23 ++- arch/arm/mm/mm.h | 3 + arch/arm/mm/mmu.c | 31 ++- arch/arm/plat-s5p/dev-mfc.c | 51 +---- arch/x86/Kconfig | 1 + arch/x86/include/asm/dma-contiguous.h | 13 + arch/x86/include/asm/dma-mapping.h | 4 + arch/x86/kernel/pci-dma.c | 18 ++- arch/x86/kernel/pci-nommu.c | 8 +- arch/x86/kernel/setup.c | 2 + drivers/base/Kconfig | 89 +++++++ drivers/base/Makefile | 1 + drivers/base/dma-contiguous.c | 401 +++++++++++++++++++++++++++++++ include/asm-generic/dma-contiguous.h | 28 +++ include/linux/device.h | 4 + include/linux/dma-contiguous.h | 110 +++++++++ include/linux/gfp.h | 12 + include/linux/mmzone.h | 47 +++- include/linux/page-isolation.h | 18 +- mm/Kconfig | 2 +- mm/Makefile | 3 +- mm/compaction.c | 418 +++++++++++++++++++++------------ mm/internal.h | 33 +++ mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 6 +- mm/page_alloc.c | 413 ++++++++++++++++++++++++++++---- mm/page_isolation.c | 15 +- mm/vmstat.c | 3 + 35 files changed, 1791 insertions(+), 375 deletions(-) create mode 100644 arch/arm/include/asm/dma-contiguous.h create mode 100644 arch/x86/include/asm/dma-contiguous.h create mode 100644 drivers/base/dma-contiguous.c create mode 100644 include/asm-generic/dma-contiguous.h create mode 100644 include/linux/dma-contiguous.h

13 years, 9 months

6
23
0 0

[PATCHv21 00/16] Contiguous Memory Allocator

by Marek Szyprowski

Hello, This is yet another quick update on CMA patches (this should be the last one, really). We fixed minor bug which might cause incorrect operation of memory compaction code as well as merged some simple updates to memory reclaim function called by alloc_contig_range. I really hope that this will be a last iteration of this series. Best regards Marek Szyprowski Samsung Poland R&D Center Links to previous versions of the patchset: v20: <http://www.spinics.net/lists/linux-mm/msg29145.html> v19: <http://www.spinics.net/lists/linux-mm/msg29145.html> v18: <http://www.spinics.net/lists/linux-mm/msg28125.html> v17: <http://www.spinics.net/lists/arm-kernel/msg148499.html> v16: <http://www.spinics.net/lists/linux-mm/msg25066.html> v15: <http://www.spinics.net/lists/linux-mm/msg23365.html> v14: <http://www.spinics.net/lists/linux-media/msg36536.html> v13: (internal, intentionally not released) v12: <http://www.spinics.net/lists/linux-media/msg35674.html> v11: <http://www.spinics.net/lists/linux-mm/msg21868.html> v10: <http://www.spinics.net/lists/linux-mm/msg20761.html> v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787> v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855> v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010> v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573> v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986> v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669> Changelog: v21: 1. Fixed incorrect check which broke memory compaction code 2. Fixed hacky and racy min_free_kbytes handling 3. Added serialization patch to watermark calculation 4. Fixed typos here and there in the comments v20 and earlier - see previous patchsets. Patches in this patchset: Marek Szyprowski (6): mm: extract reclaim code from __alloc_pages_direct_reclaim() mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks drivers: add Contiguous Memory Allocator X86: integrate CMA with DMA-mapping subsystem ARM: integrate CMA with DMA-mapping subsystem ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Mel Gorman (1): mm: Serialize access to min_free_kbytes Michal Nazarewicz (9): mm: page_alloc: remove trailing whitespace mm: compaction: introduce isolate_migratepages_range() mm: compaction: introduce map_pages() mm: compaction: introduce isolate_freepages_range() mm: compaction: export some of the functions mm: page_alloc: introduce alloc_contig_range() mm: page_alloc: change fallbacks array handling mm: mmzone: MIGRATE_CMA migration type added mm: page_isolation: MIGRATE_CMA isolation functions added Documentation/kernel-parameters.txt | 9 + arch/Kconfig | 3 + arch/arm/Kconfig | 2 + arch/arm/include/asm/dma-contiguous.h | 16 ++ arch/arm/include/asm/mach/map.h | 1 + arch/arm/kernel/setup.c | 9 +- arch/arm/mm/dma-mapping.c | 368 ++++++++++++++++++++++++------ arch/arm/mm/init.c | 24 ++- arch/arm/mm/mm.h | 3 + arch/arm/mm/mmu.c | 31 ++- arch/arm/plat-s5p/dev-mfc.c | 51 +---- arch/x86/Kconfig | 1 + arch/x86/include/asm/dma-contiguous.h | 13 + arch/x86/include/asm/dma-mapping.h | 4 + arch/x86/kernel/pci-dma.c | 18 ++- arch/x86/kernel/pci-nommu.c | 8 +- arch/x86/kernel/setup.c | 2 + drivers/base/Kconfig | 89 +++++++ drivers/base/Makefile | 1 + drivers/base/dma-contiguous.c | 403 +++++++++++++++++++++++++++++++ include/asm-generic/dma-contiguous.h | 27 ++ include/linux/device.h | 4 + include/linux/dma-contiguous.h | 110 +++++++++ include/linux/gfp.h | 12 + include/linux/mmzone.h | 47 +++- include/linux/page-isolation.h | 18 +- mm/Kconfig | 2 +- mm/Makefile | 3 +- mm/compaction.c | 418 +++++++++++++++++++++------------ mm/internal.h | 33 +++ mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 6 +- mm/page_alloc.c | 413 ++++++++++++++++++++++++++++---- mm/page_isolation.c | 15 +- mm/vmstat.c | 3 + 35 files changed, 1794 insertions(+), 375 deletions(-) create mode 100644 arch/arm/include/asm/dma-contiguous.h create mode 100644 arch/x86/include/asm/dma-contiguous.h create mode 100644 drivers/base/dma-contiguous.c create mode 100644 include/asm-generic/dma-contiguous.h create mode 100644 include/linux/dma-contiguous.h -- 1.7.1.569.g6f426

13 years, 10 months

4
30
0 0

[PATCHv19 00/15] Contiguous Memory Allocator

by Marek Szyprowski

Welcome everyone! Yes, that's true. This is yet another release of the Contiguous Memory Allocator patches. This version mainly includes code cleanups requested by Mel Gorman and a few minor bug fixes. ARM integration code has not been changed since v16. It provides implementation of the ideas that has been discussed during Linaro Sprint meeting in Cambourne, August 2011. Here are the details: This version provides a solution for complete integration of CMA to DMA mapping subsystem on ARM architecture. The issue caused by double dma pages mapping and possible aliasing in coherent memory mapping has been finally resolved, both for GFP_ATOMIC case (allocations comes from coherent memory pool) and non-GFP_ATOMIC case (allocations comes from CMA managed areas). For coherent, nommu, ARMv4 and ARMv5 systems the current DMA-mapping implementation has been kept. For ARMv6+ systems, CMA has been enabled and a special pool of coherent memory for atomic allocations has been created. The size of this pool defaults to DEFAULT_CONSISTEN_DMA_SIZE/8, but can be changed with coherent_pool kernel parameter (if really required). All atomic allocations are served from this pool. I've did a little simplification here, because there is no separate pool for writecombine memory - such requests are also served from coherent pool. I don't think that such simplification is a problem here - I found no driver that use dma_alloc_writecombine with GFP_ATOMIC flags. All non-atomic allocation are served from CMA area. Kernel mappings are updated to reflect required memory attributes changes. This is possible because during early boot, all CMA area are remapped with 4KiB pages in kernel low-memory. This version have been tested on Samsung S5PC110 based Goni machine and Exynos4 UniversalC210 board with various V4L2 multimedia drivers. Coherent atomic allocations has been tested by manually enabling the dma bounce for the s3c-sdhci device. All patches are prepared for Linux Kernel v3.3-rc1. A few words for these who see CMA for the first time: The Contiguous Memory Allocator (CMA) makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted. The main difference from the similar frameworks is the fact that CMA allows to transparently reuse memory region reserved for the big chunk allocation as a system memory, so no memory is wasted when no big chunk is allocated. Once the alloc request is issued, the framework will migrate system pages to create a required big chunk of physically contiguous memory. For more information you can refer to nice LWN articles: http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/ as well as links to previous versions of the CMA framework. The CMA framework has been initially developed by Michal Nazarewicz at Samsung Poland R&D Center. Since version 9, I've taken over the development, because Michal has left the company. Since version v17 Michal is working again on CMA patches and the current version is the result of our joint open-source effort. TODO (optional): - implement support for contiguous memory areas placed in HIGHMEM zone - resolve issue with movable pages with pending io operations Best regards Marek Szyprowski Samsung Poland R&D Center Links to previous versions of the patchset: v18: <http://www.spinics.net/lists/linux-mm/msg28125.html> v17: <http://www.spinics.net/lists/arm-kernel/msg148499.html> v16: <http://www.spinics.net/lists/linux-mm/msg25066.html> v15: <http://www.spinics.net/lists/linux-mm/msg23365.html> v14: <http://www.spinics.net/lists/linux-media/msg36536.html> v13: (internal, intentionally not released) v12: <http://www.spinics.net/lists/linux-media/msg35674.html> v11: <http://www.spinics.net/lists/linux-mm/msg21868.html> v10: <http://www.spinics.net/lists/linux-mm/msg20761.html> v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787> v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855> v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010> v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573> v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986> v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669> Changelog: v19: 1. Addressed another set of comments and suggestions from Mel Gorman, mainly related to breaking patches into smaller, single-feature related chunks and rewriting already existing functions in memory compaction code. 2. Reworked completely page reclaim code, removed it from split_free_page() and introduce direct call from alloc_contig_range(). 3. Merged a fix from Mans Rullgard for correct cma area limit alignment. 4. Replaced broken "mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating" patch with "mm: page_alloc: update migrate type of pages on pcp when isolating" which is another attempt to solve this issue without touching free_pcppages_bulk(). 5. Rebased onto v3.3-rc1 v18: 1. Addressed comments and suggestions from Mel Gorman related to changes in memory compaction code, most important points: - removed "mm: page_alloc: handle MIGRATE_ISOLATE in free_pcppages_bulk()" and moved all the logic to set_migratetype_isolate - see "mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating" patch - code in "mm: compaction: introduce isolate_{free,migrate}pages_range()" patch have been simplified and improved - removed "mm: mmzone: introduce zone_pfn_same_memmap()" patch 2. Fixed crash on initialization if HIGHMEM is available on ARM platforms 3. Fixed problems with allocation of contiguous memory if all free pages are occupied by page cache and reclaim is required. 4. Added a workaround for temporary migration failures (now CMA tries to allocate different memory block in such case), what heavily increased reliability of the CMA. 5. Minor cleanup here and there. 6. Rebased onto v3.2-rc7 kernel tree. v17: 1. Replaced whole CMA core memory migration code to the new one kindly provided by Michal Nazarewicz. The new code is based on memory compaction framework not the memory hotplug, like it was before. This change has been suggested by Mel Godman. 2. Addressed most of the comments from Andrew Morton and Mel Gorman in the rest of the CMA code. 3. Fixed broken initialization on ARM systems with DMA zone enabled. 4. Rebased onto v3.2-rc2 kernel. v16: 1. merged a fixup from Michal Nazarewicz to address comments from Dave Hansen about checking if pfns belong to the same memory zone 2. merged a fix from Michal Nazarewicz for incorrect handling of pages which belong to page block that is in MIGRATE_ISOLATE state, in very rare cases the migrate type of page block might have been changed from MIGRATE_CMA to MIGRATE_MOVABLE because of this bug 3. moved some common code to include/asm-generic 4. added support for x86 DMA-mapping framework for pci-dma hardware, CMA can be now even more widely tested on KVM/QEMU and a lot of common x86 boxes 5. rebased onto next-20111005 kernel tree, which includes changes in ARM DMA-mapping subsystem (CONSISTENT_DMA_SIZE removal) 6. removed patch for CMA s5p-fimc device private regions (served only as example) and provided the one that matches real life case - s5p-mfc device v15: 1. fixed calculation of the total memory after activating CMA area (was broken from v12) 2. more code cleanup in drivers/base/dma-contiguous.c 3. added address limit for default CMA area 4. rewrote ARM DMA integration: - removed "ARM: DMA: steal memory for DMA coherent mappings" patch - kept current DMA mapping implementation for coherent, nommu and ARMv4/ARMv5 systems - enabled CMA for all ARMv6+ systems - added separate, small pool for coherent atomic allocations, defaults to CONSISTENT_DMA_SIZE/8, but can be changed with kernel parameter coherent_pool=[size] v14: 1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" patch, added support for GFP_ATOMIC allocations. 2. Added checks for NULL device pointer v13: (internal, intentionally not released) v12: 1. Fixed 2 nasty bugs in dma-contiguous allocator: - alignment argument was not passed correctly - range for dma_release_from_contiguous was not checked correctly 2. Added support for architecture specfic dma_contiguous_early_fixup() function 3. CMA and DMA-mapping integration for ARM architechture has been rewritten to take care of the memory aliasing issue that might happen for newer ARM CPUs (mapping of the same pages with different cache attributes is forbidden). TODO: add support for GFP_ATOMIC allocations basing on the "ARM: DMA: steal memory for DMA coherent mappings" patch and implement support for contiguous memory areas that are placed in HIGHMEM zone v11: 1. Removed genalloc usage and replaced it with direct calls to bitmap_* functions, dropped patches that are not needed anymore (genalloc extensions) 2. Moved all contiguous area management code from mm/cma.c to drivers/base/dma-contiguous.c 3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous 4. Introduced global, system wide (default) contiguous area configured with kernel config and kernel cmdline parameters 5. Simplified initialization to just one function: dma_declare_contiguous() 6. Added example of device private memory contiguous area v10: 1. Rebased onto 3.0-rc2 and resolved all conflicts 2. Simplified CMA to be just a pure memory allocator, for use with platfrom/bus specific subsystems, like dma-mapping. Removed all device specific functions are calls. 3. Integrated with ARM DMA-mapping subsystem. 4. Code cleanup here and there. 5. Removed private context support. v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts 2. Fixed a bunch of nasty bugs that happened when the allocation failed (mainly kernel oops due to NULL ptr dereference). 3. Introduced testing code: cma-regions compatibility layer and videobuf2-cma memory allocator module. v8: 1. The alloc_contig_range() function has now been separated from CMA and put in page_allocator.c. This function tries to migrate all LRU pages in specified range and then allocate the range using alloc_contig_freed_pages(). 2. Support for MIGRATE_CMA has been separated from the CMA code. I have not tested if CMA works with ZONE_MOVABLE but I see no reasons why it shouldn't. 3. I have added a @private argument when creating CMA contexts so that one can reserve memory and not share it with the rest of the system. This way, CMA acts only as allocation algorithm. v7: 1. A lot of functionality that handled driver->allocator_context mapping has been removed from the patchset. This is not to say that this code is not needed, it's just not worth posting everything in one patchset. Currently, CMA is "just" an allocator. It uses it's own migratetype (MIGRATE_CMA) for defining ranges of pageblokcs which behave just like ZONE_MOVABLE but dispite the latter can be put in arbitrary places. 2. The migration code that was introduced in the previous version actually started working. v6: 1. Most importantly, v6 introduces support for memory migration. The implementation is not yet complete though. Migration support means that when CMA is not using memory reserved for it, page allocator can allocate pages from it. When CMA wants to use the memory, the pages have to be moved and/or evicted as to make room for CMA. To make it possible it must be guaranteed that only movable and reclaimable pages are allocated in CMA controlled regions. This is done by introducing a MIGRATE_CMA migrate type that guarantees exactly that. Some of the migration code is "borrowed" from Kamezawa Hiroyuki's alloc_contig_pages() implementation. The main difference is that thanks to MIGRATE_CMA migrate type CMA assumes that memory controlled by CMA are is always movable or reclaimable so that it makes allocation decisions regardless of the whether some pages are actually allocated and migrates them if needed. The most interesting patches from the patchset that implement the functionality are: 09/13: mm: alloc_contig_free_pages() added 10/13: mm: MIGRATE_CMA migration type added 11/13: mm: MIGRATE_CMA isolation functions added 12/13: mm: cma: Migration support added [wip] Currently, kernel panics in some situations which I am trying to investigate. 2. cma_pin() and cma_unpin() functions has been added (after a conversation with Johan Mossberg). The idea is that whenever hardware does not use the memory (no transaction is on) the chunk can be moved around. This would allow defragmentation to be implemented if desired. No defragmentation algorithm is provided at this time. 3. Sysfs support has been replaced with debugfs. I always felt unsure about the sysfs interface and when Greg KH pointed it out I finally got to rewrite it to debugfs. v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: 1. The "asterisk" flag has been removed in favour of requiring that platform will provide a "*=<regions>" rule in the map attribute. 2. The terminology has been changed slightly renaming "kind" to "type" of memory. In the previous revisions, the documentation indicated that device drivers define memory kinds and now, v3: 1. The command line parameters have been removed (and moved to a separate patch, the fourth one). As a consequence, the cma_set_defaults() function has been changed -- it no longer accepts a string with list of regions but an array of regions. 2. The "asterisk" attribute has been removed. Now, each region has an "asterisk" flag which lets one specify whether this region should by considered "asterisk" region. 3. SysFS support has been moved to a separate patch (the third one in the series) and now also includes list of regions. v2: 1. The "cma_map" command line have been removed. In exchange, a SysFS entry has been created under kernel/mm/contiguous. The intended way of specifying the attributes is a cma_set_defaults() function called by platform initialisation code. "regions" attribute (the string specified by "cma" command line parameter) can be overwritten with command line parameter; the other attributes can be changed during run-time using the SysFS entries. 2. The behaviour of the "map" attribute has been modified slightly. Currently, if no rule matches given device it is assigned regions specified by the "asterisk" attribute. It is by default built from the region names given in "regions" attribute. 3. Devices can register private regions as well as regions that can be shared but are not reserved using standard CMA mechanisms. A private region has no name and can be accessed only by devices that have the pointer to it. 4. The way allocators are registered has changed. Currently, a cma_allocator_register() function is used for that purpose. Moreover, allocators are attached to regions the first time memory is registered from the region or when allocator is registered which means that allocators can be dynamic modules that are loaded after the kernel booted (of course, it won't be possible to allocate a chunk of memory from a region if allocator is not loaded). 5. Index of new functions: +static inline dma_addr_t __must_check +cma_alloc_from(const char *regions, size_t size, + dma_addr_t alignment) +static inline int +cma_info_about(struct cma_info *info, const const char *regions) +int __must_check cma_region_register(struct cma_region *reg); +dma_addr_t __must_check +cma_alloc_from_region(struct cma_region *reg, + size_t size, dma_addr_t alignment); +static inline dma_addr_t __must_check +cma_alloc_from(const char *regions, + size_t size, dma_addr_t alignment); +int cma_allocator_register(struct cma_allocator *alloc); Patches in this patchset: Marek Szyprowski (6): mm: extract reclaim code from __alloc_pages_direct_reclaim() mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks drivers: add Contiguous Memory Allocator X86: integrate CMA with DMA-mapping subsystem ARM: integrate CMA with DMA-mapping subsystem ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Michal Nazarewicz (9): mm: page_alloc: remove trailing whitespace mm: page_alloc: update migrate type of pages on pcp when isolating mm: compaction: introduce isolate_migratepages_range(). mm: compaction: introduce isolate_freepages_range() mm: compaction: export some of the functions mm: page_alloc: introduce alloc_contig_range() mm: page_alloc: change fallbacks array handling mm: mmzone: MIGRATE_CMA migration type added mm: page_isolation: MIGRATE_CMA isolation functions added Documentation/kernel-parameters.txt | 9 + arch/Kconfig | 3 + arch/arm/Kconfig | 2 + arch/arm/include/asm/dma-contiguous.h | 16 ++ arch/arm/include/asm/mach/map.h | 1 + arch/arm/kernel/setup.c | 9 +- arch/arm/mm/dma-mapping.c | 368 ++++++++++++++++++++++++------ arch/arm/mm/init.c | 22 ++- arch/arm/mm/mm.h | 3 + arch/arm/mm/mmu.c | 31 ++- arch/arm/plat-s5p/dev-mfc.c | 51 +---- arch/x86/Kconfig | 1 + arch/x86/include/asm/dma-contiguous.h | 13 + arch/x86/include/asm/dma-mapping.h | 4 + arch/x86/kernel/pci-dma.c | 18 ++- arch/x86/kernel/pci-nommu.c | 8 +- arch/x86/kernel/setup.c | 2 + drivers/base/Kconfig | 89 +++++++ drivers/base/Makefile | 1 + drivers/base/dma-contiguous.c | 404 ++++++++++++++++++++++++++++++++ include/asm-generic/dma-contiguous.h | 27 +++ include/linux/device.h | 4 + include/linux/dma-contiguous.h | 110 +++++++++ include/linux/mmzone.h | 43 +++- include/linux/page-isolation.h | 35 ++- mm/Kconfig | 2 +- mm/Makefile | 3 +- mm/compaction.c | 414 +++++++++++++++++++++------------ mm/internal.h | 33 +++ mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 6 +- mm/page_alloc.c | 355 +++++++++++++++++++++++++--- mm/page_isolation.c | 39 +++- mm/vmstat.c | 3 + 34 files changed, 1770 insertions(+), 361 deletions(-) create mode 100644 arch/arm/include/asm/dma-contiguous.h create mode 100644 arch/x86/include/asm/dma-contiguous.h create mode 100644 drivers/base/dma-contiguous.c create mode 100644 include/asm-generic/dma-contiguous.h create mode 100644 include/linux/dma-contiguous.h -- 1.7.1.569.g6f426

13 years, 10 months

10
61
0 0

[PATCHv20 00/15] Contiguous Memory Allocator

by Marek Szyprowski

Welcome everyone again! This is yet another quick update on Contiguous Memory Allocator patches. This version includes another set of code cleanups requested by Mel Gorman and a few minor bug fixes. I really hope that this version will be accepted for merging and future development will be handled by incremental patches. ARM integration code has not been changed since v16. It provides implementation of the ideas that has been discussed during Linaro Sprint meeting in Cambourne, August 2011. Here are the details: This version provides a solution for complete integration of CMA to DMA mapping subsystem on ARM architecture. The issue caused by double dma pages mapping and possible aliasing in coherent memory mapping has been finally resolved, both for GFP_ATOMIC case (allocations comes from coherent memory pool) and non-GFP_ATOMIC case (allocations comes from CMA managed areas). For coherent, nommu, ARMv4 and ARMv5 systems the current DMA-mapping implementation has been kept. For ARMv6+ systems, CMA has been enabled and a special pool of coherent memory for atomic allocations has been created. The size of this pool defaults to DEFAULT_CONSISTEN_DMA_SIZE/8, but can be changed with coherent_pool kernel parameter (if really required). All atomic allocations are served from this pool. I've did a little simplification here, because there is no separate pool for writecombine memory - such requests are also served from coherent pool. I don't think that such simplification is a problem here - I found no driver that use dma_alloc_writecombine with GFP_ATOMIC flags. All non-atomic allocation are served from CMA area. Kernel mappings are updated to reflect required memory attributes changes. This is possible because during early boot, all CMA area are remapped with 4KiB pages in kernel low-memory. This version have been tested on Samsung S5PC110 based Goni machine and Exynos4 UniversalC210 board with various V4L2 multimedia drivers. Coherent atomic allocations has been tested by manually enabling the dma bounce for the s3c-sdhci device. All patches are prepared on top of Linux Kernel v3.3-rc2. A few words for these who see CMA for the first time: The Contiguous Memory Allocator (CMA) makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted. The main difference from the similar frameworks is the fact that CMA allows to transparently reuse memory region reserved for the big chunk allocation as a system memory, so no memory is wasted when no big chunk is allocated. Once the alloc request is issued, the framework will migrate system pages to create a required big chunk of physically contiguous memory. For more information you can refer to nice LWN articles: http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/ as well as links to previous versions of the CMA framework. The CMA framework has been initially developed by Michal Nazarewicz at Samsung Poland R&D Center. Since version 9, I've taken over the development, because Michal has left the company. Since version v17 Michal is working again on CMA patches and the current version is the result of our joint open-source effort. Best regards Marek Szyprowski Samsung Poland R&D Center Links to previous versions of the patchset: v19: <http://www.spinics.net/lists/linux-mm/msg29145.html> v18: <http://www.spinics.net/lists/linux-mm/msg28125.html> v17: <http://www.spinics.net/lists/arm-kernel/msg148499.html> v16: <http://www.spinics.net/lists/linux-mm/msg25066.html> v15: <http://www.spinics.net/lists/linux-mm/msg23365.html> v14: <http://www.spinics.net/lists/linux-media/msg36536.html> v13: (internal, intentionally not released) v12: <http://www.spinics.net/lists/linux-media/msg35674.html> v11: <http://www.spinics.net/lists/linux-mm/msg21868.html> v10: <http://www.spinics.net/lists/linux-mm/msg20761.html> v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787> v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855> v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010> v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573> v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986> v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669> Changelog: v20: 1. Addressed even more comments from Mel Gorman and added his Acked-by tag on most of the core memory management patches. 2. Squashed a few minor fixes here and there (corrected alignment calculation for region limit, added adjusting low watermark level on reclaim, fixed return value of __alloc_contig_migrate_range function) 3. Removed problematic "mm: page_alloc: update migrate type of pages on pcp when isolating" patch and sligtly altered MIGRATE_CMA type handling what solved the problem 4. Rebased onto v3.3-rc2 v19: 1. Addressed another set of comments and suggestions from Mel Gorman, mainly related to breaking patches into smaller, single-feature related chunks and rewriting already existing functions in memory compaction code. 2. Reworked completely page reclaim code, removed it from split_free_page() and introduce direct call from alloc_contig_range(). 3. Merged a fix from Mans Rullgard for correct cma area limit alignment. 4. Replaced broken "mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating" patch with "mm: page_alloc: update migrate type of pages on pcp when isolating" which is another attempt to solve this issue without touching free_pcppages_bulk(). 5. Rebased onto v3.3-rc1 v18: 1. Addressed comments and suggestions from Mel Gorman related to changes in memory compaction code, most important points: - removed "mm: page_alloc: handle MIGRATE_ISOLATE in free_pcppages_bulk()" and moved all the logic to set_migratetype_isolate - see "mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating" patch - code in "mm: compaction: introduce isolate_{free,migrate}pages_range()" patch have been simplified and improved - removed "mm: mmzone: introduce zone_pfn_same_memmap()" patch 2. Fixed crash on initialization if HIGHMEM is available on ARM platforms 3. Fixed problems with allocation of contiguous memory if all free pages are occupied by page cache and reclaim is required. 4. Added a workaround for temporary migration failures (now CMA tries to allocate different memory block in such case), what heavily increased reliability of the CMA. 5. Minor cleanup here and there. 6. Rebased onto v3.2-rc7 kernel tree. v17: 1. Replaced whole CMA core memory migration code to the new one kindly provided by Michal Nazarewicz. The new code is based on memory compaction framework not the memory hotplug, like it was before. This change has been suggested by Mel Godman. 2. Addressed most of the comments from Andrew Morton and Mel Gorman in the rest of the CMA code. 3. Fixed broken initialization on ARM systems with DMA zone enabled. 4. Rebased onto v3.2-rc2 kernel. v16: 1. merged a fixup from Michal Nazarewicz to address comments from Dave Hansen about checking if pfns belong to the same memory zone 2. merged a fix from Michal Nazarewicz for incorrect handling of pages which belong to page block that is in MIGRATE_ISOLATE state, in very rare cases the migrate type of page block might have been changed from MIGRATE_CMA to MIGRATE_MOVABLE because of this bug 3. moved some common code to include/asm-generic 4. added support for x86 DMA-mapping framework for pci-dma hardware, CMA can be now even more widely tested on KVM/QEMU and a lot of common x86 boxes 5. rebased onto next-20111005 kernel tree, which includes changes in ARM DMA-mapping subsystem (CONSISTENT_DMA_SIZE removal) 6. removed patch for CMA s5p-fimc device private regions (served only as example) and provided the one that matches real life case - s5p-mfc device v15: 1. fixed calculation of the total memory after activating CMA area (was broken from v12) 2. more code cleanup in drivers/base/dma-contiguous.c 3. added address limit for default CMA area 4. rewrote ARM DMA integration: - removed "ARM: DMA: steal memory for DMA coherent mappings" patch - kept current DMA mapping implementation for coherent, nommu and ARMv4/ARMv5 systems - enabled CMA for all ARMv6+ systems - added separate, small pool for coherent atomic allocations, defaults to CONSISTENT_DMA_SIZE/8, but can be changed with kernel parameter coherent_pool=[size] v14: 1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" patch, added support for GFP_ATOMIC allocations. 2. Added checks for NULL device pointer v13: (internal, intentionally not released) v12: 1. Fixed 2 nasty bugs in dma-contiguous allocator: - alignment argument was not passed correctly - range for dma_release_from_contiguous was not checked correctly 2. Added support for architecture specfic dma_contiguous_early_fixup() function 3. CMA and DMA-mapping integration for ARM architechture has been rewritten to take care of the memory aliasing issue that might happen for newer ARM CPUs (mapping of the same pages with different cache attributes is forbidden). TODO: add support for GFP_ATOMIC allocations basing on the "ARM: DMA: steal memory for DMA coherent mappings" patch and implement support for contiguous memory areas that are placed in HIGHMEM zone v11: 1. Removed genalloc usage and replaced it with direct calls to bitmap_* functions, dropped patches that are not needed anymore (genalloc extensions) 2. Moved all contiguous area management code from mm/cma.c to drivers/base/dma-contiguous.c 3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous 4. Introduced global, system wide (default) contiguous area configured with kernel config and kernel cmdline parameters 5. Simplified initialization to just one function: dma_declare_contiguous() 6. Added example of device private memory contiguous area v10: 1. Rebased onto 3.0-rc2 and resolved all conflicts 2. Simplified CMA to be just a pure memory allocator, for use with platfrom/bus specific subsystems, like dma-mapping. Removed all device specific functions are calls. 3. Integrated with ARM DMA-mapping subsystem. 4. Code cleanup here and there. 5. Removed private context support. v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts 2. Fixed a bunch of nasty bugs that happened when the allocation failed (mainly kernel oops due to NULL ptr dereference). 3. Introduced testing code: cma-regions compatibility layer and videobuf2-cma memory allocator module. v8: 1. The alloc_contig_range() function has now been separated from CMA and put in page_allocator.c. This function tries to migrate all LRU pages in specified range and then allocate the range using alloc_contig_freed_pages(). 2. Support for MIGRATE_CMA has been separated from the CMA code. I have not tested if CMA works with ZONE_MOVABLE but I see no reasons why it shouldn't. 3. I have added a @private argument when creating CMA contexts so that one can reserve memory and not share it with the rest of the system. This way, CMA acts only as allocation algorithm. v7: 1. A lot of functionality that handled driver->allocator_context mapping has been removed from the patchset. This is not to say that this code is not needed, it's just not worth posting everything in one patchset. Currently, CMA is "just" an allocator. It uses it's own migratetype (MIGRATE_CMA) for defining ranges of pageblokcs which behave just like ZONE_MOVABLE but dispite the latter can be put in arbitrary places. 2. The migration code that was introduced in the previous version actually started working. v6: 1. Most importantly, v6 introduces support for memory migration. The implementation is not yet complete though. Migration support means that when CMA is not using memory reserved for it, page allocator can allocate pages from it. When CMA wants to use the memory, the pages have to be moved and/or evicted as to make room for CMA. To make it possible it must be guaranteed that only movable and reclaimable pages are allocated in CMA controlled regions. This is done by introducing a MIGRATE_CMA migrate type that guarantees exactly that. Some of the migration code is "borrowed" from Kamezawa Hiroyuki's alloc_contig_pages() implementation. The main difference is that thanks to MIGRATE_CMA migrate type CMA assumes that memory controlled by CMA are is always movable or reclaimable so that it makes allocation decisions regardless of the whether some pages are actually allocated and migrates them if needed. The most interesting patches from the patchset that implement the functionality are: 09/13: mm: alloc_contig_free_pages() added 10/13: mm: MIGRATE_CMA migration type added 11/13: mm: MIGRATE_CMA isolation functions added 12/13: mm: cma: Migration support added [wip] Currently, kernel panics in some situations which I am trying to investigate. 2. cma_pin() and cma_unpin() functions has been added (after a conversation with Johan Mossberg). The idea is that whenever hardware does not use the memory (no transaction is on) the chunk can be moved around. This would allow defragmentation to be implemented if desired. No defragmentation algorithm is provided at this time. 3. Sysfs support has been replaced with debugfs. I always felt unsure about the sysfs interface and when Greg KH pointed it out I finally got to rewrite it to debugfs. v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: 1. The "asterisk" flag has been removed in favour of requiring that platform will provide a "*=<regions>" rule in the map attribute. 2. The terminology has been changed slightly renaming "kind" to "type" of memory. In the previous revisions, the documentation indicated that device drivers define memory kinds and now, v3: 1. The command line parameters have been removed (and moved to a separate patch, the fourth one). As a consequence, the cma_set_defaults() function has been changed -- it no longer accepts a string with list of regions but an array of regions. 2. The "asterisk" attribute has been removed. Now, each region has an "asterisk" flag which lets one specify whether this region should by considered "asterisk" region. 3. SysFS support has been moved to a separate patch (the third one in the series) and now also includes list of regions. v2: 1. The "cma_map" command line have been removed. In exchange, a SysFS entry has been created under kernel/mm/contiguous. The intended way of specifying the attributes is a cma_set_defaults() function called by platform initialisation code. "regions" attribute (the string specified by "cma" command line parameter) can be overwritten with command line parameter; the other attributes can be changed during run-time using the SysFS entries. 2. The behaviour of the "map" attribute has been modified slightly. Currently, if no rule matches given device it is assigned regions specified by the "asterisk" attribute. It is by default built from the region names given in "regions" attribute. 3. Devices can register private regions as well as regions that can be shared but are not reserved using standard CMA mechanisms. A private region has no name and can be accessed only by devices that have the pointer to it. 4. The way allocators are registered has changed. Currently, a cma_allocator_register() function is used for that purpose. Moreover, allocators are attached to regions the first time memory is registered from the region or when allocator is registered which means that allocators can be dynamic modules that are loaded after the kernel booted (of course, it won't be possible to allocate a chunk of memory from a region if allocator is not loaded). 5. Index of new functions: +static inline dma_addr_t __must_check +cma_alloc_from(const char *regions, size_t size, + dma_addr_t alignment) +static inline int +cma_info_about(struct cma_info *info, const const char *regions) +int __must_check cma_region_register(struct cma_region *reg); +dma_addr_t __must_check +cma_alloc_from_region(struct cma_region *reg, + size_t size, dma_addr_t alignment); +static inline dma_addr_t __must_check +cma_alloc_from(const char *regions, + size_t size, dma_addr_t alignment); +int cma_allocator_register(struct cma_allocator *alloc); Patches in this patchset: Marek Szyprowski (6): mm: extract reclaim code from __alloc_pages_direct_reclaim() mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks drivers: add Contiguous Memory Allocator X86: integrate CMA with DMA-mapping subsystem ARM: integrate CMA with DMA-mapping subsystem ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Michal Nazarewicz (9): mm: page_alloc: remove trailing whitespace mm: compaction: introduce isolate_migratepages_range(). mm: compaction: introduce map_pages() mm: compaction: introduce isolate_freepages_range() mm: compaction: export some of the functions mm: page_alloc: introduce alloc_contig_range() mm: page_alloc: change fallbacks array handling mm: mmzone: MIGRATE_CMA migration type added mm: page_isolation: MIGRATE_CMA isolation functions added Documentation/kernel-parameters.txt | 9 + arch/Kconfig | 3 + arch/arm/Kconfig | 2 + arch/arm/include/asm/dma-contiguous.h | 16 ++ arch/arm/include/asm/mach/map.h | 1 + arch/arm/kernel/setup.c | 9 +- arch/arm/mm/dma-mapping.c | 368 ++++++++++++++++++++++++------ arch/arm/mm/init.c | 22 ++- arch/arm/mm/mm.h | 3 + arch/arm/mm/mmu.c | 31 ++- arch/arm/plat-s5p/dev-mfc.c | 51 +---- arch/x86/Kconfig | 1 + arch/x86/include/asm/dma-contiguous.h | 13 + arch/x86/include/asm/dma-mapping.h | 4 + arch/x86/kernel/pci-dma.c | 18 ++- arch/x86/kernel/pci-nommu.c | 8 +- arch/x86/kernel/setup.c | 2 + drivers/base/Kconfig | 89 +++++++ drivers/base/Makefile | 1 + drivers/base/dma-contiguous.c | 405 ++++++++++++++++++++++++++++++++ include/asm-generic/dma-contiguous.h | 27 ++ include/linux/device.h | 4 + include/linux/dma-contiguous.h | 110 +++++++++ include/linux/gfp.h | 12 + include/linux/mmzone.h | 38 +++- include/linux/page-isolation.h | 18 +- mm/Kconfig | 2 +- mm/Makefile | 3 +- mm/compaction.c | 418 +++++++++++++++++++++------------ mm/internal.h | 33 +++ mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 6 +- mm/page_alloc.c | 377 ++++++++++++++++++++++++++--- mm/page_isolation.c | 15 +- mm/vmstat.c | 3 + 35 files changed, 1757 insertions(+), 367 deletions(-) create mode 100644 arch/arm/include/asm/dma-contiguous.h create mode 100644 arch/x86/include/asm/dma-contiguous.h create mode 100644 drivers/base/dma-contiguous.c create mode 100644 include/asm-generic/dma-contiguous.h create mode 100644 include/linux/dma-contiguous.h -- 1.7.1.569.g6f426

13 years, 10 months

5
35
0 0

[RFC][PATCH 1/1] gpu: ion: Add IOMMU heap allocator with IOMMU API

by Hiroshi Doyu

Hi, Recently we've implemented IOMMU heap as an attachment which is one of the ION memory manager(*1) heap/backend. This implementation is completely independent of any SoC, and this can be used for other SoC as well. If our implementation is not totally wrong, it would be nice to share some experience/code here since Ion is still not so clear to me yet. I found that Linaro also seems to have started some ION work(*2). I think that some of Ion feature could be supported/replaced with Linaro UMM. For example, presently "ion_iommu_heap" is implemented with the standard IOMMU API, but it could be also implemented with the coming DMA API? Also DMABUF can be used in Ion core part as well, I guess. Currently there's no Ion memmgr code in the upstream "drivers/staging/android"(*3). Is there any plan to support this? Or is this something considered as a completely _temporary_ solution, and never going to be added? It would be nice if we can share some of our effort here since not small Android users need Ion, even temporary. Any comment would be really appreciated. Hiroshi DOYU *1: https://android.googlesource.com/kernel/common.git $ git clone https://android.googlesource.com/kernel/common.git $ cd common $ git checkout -b android origin/android-3.0 $ git grep -e "<linux/ion.h>" drivers/ drivers/gpu/ion/ion.c:#include <linux/ion.h> drivers/gpu/ion/ion_carveout_heap.c:#include <linux/ion.h> drivers/gpu/ion/ion_heap.c:#include <linux/ion.h> drivers/gpu/ion/ion_priv.h:#include <linux/ion.h> drivers/gpu/ion/ion_system_heap.c:#include <linux/ion.h> drivers/gpu/ion/ion_system_mapper.c:#include <linux/ion.h> drivers/gpu/ion/tegra/tegra_ion.c:#include <linux/ion.h> *2: https://blueprints.launchpad.net/linaro-mm-sig/+spec/linaro-mmwg-cma-ion *3: http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=tree;f=driv…

13 years, 10 months

4
8
0 0

dma_buf presentation at fosdem in brussel

by Daniel Vetter

Hi all, Just a quick heads up for those who're interested, phoronix recorded my dma_buf talk at fosdem. It starts with a quick recap about what v1 is, but it's mostly about what's still missing and what we still need to solve imo: http://www.phoronix.com/scan.php?page=news_item&px=MTA1NDE Slides are on my fdo account: http://people.freedesktop.org/~danvet/presentations/fosdem2012-dma_buf.odp Cheers, Daniel -- Daniel Vetter Mail: daniel(a)ffwll.ch Mobile: +41 (0)79 365 57 48

13 years, 10 months

1
0
0 0

[RFCv1 0/4] v4l: DMA buffer sharing support as a user

by Sumit Semwal

Hello Everyone, A very happy new year 2012! :) This patchset is an RFC for the way videobuf2 can be adapted to add support for DMA buffer sharing framework[1]. The original patch-set for the idea, and PoC of buffer sharing was by Tomasz Stanislawski <t.stanislaws(a)samsung.com>, who demonstrated buffer sharing between two v4l2 devices[2]. This RFC is needed to adapt these patches to the changes that have happened in the DMA buffer sharing framework over past few months. To begin with, I have tried to adapt only the dma-contig allocator, and only as a user of dma-buf buffer. I am currently working on the v4l2-as-an-exporter changes, and will share as soon as I get it in some shape. As with the PoC [2], the handle for sharing buffers is a file-descriptor (fd). The usage documentation is also a part of [1]. So, the current RFC has the following limitations: - Only buffer sharing as a buffer user, - doesn't handle cases where even for a contiguous buffer, the sg_table can have more than one scatterlist entry. Thanks and best regards, ~Sumit. [1]: dma-buf patchset at: https://lkml.org/lkml/2011/12/26/29 [2]: http://lwn.net/Articles/454389 Sumit Semwal (4): v4l: Add DMABUF as a memory type v4l:vb2: add support for shared buffer (dma_buf) v4l:vb: remove warnings about MEMORY_DMABUF v4l:vb2: Add dma-contig allocator as dma_buf user drivers/media/video/videobuf-core.c | 4 + drivers/media/video/videobuf2-core.c | 186 +++++++++++++++++++++++++++- drivers/media/video/videobuf2-dma-contig.c | 125 +++++++++++++++++++ include/linux/videodev2.h | 8 ++ include/media/videobuf2-core.h | 30 +++++ 5 files changed, 352 insertions(+), 1 deletions(-) -- 1.7.5.4

13 years, 10 months

11
45
0 0

[RFCv1 0/6] PASR: Partial Array Self-Refresh Framework

by Maxime Coquelin

PASR Frameworks brings support for the Partial Array Self-Refresh DDR power management feature. PASR has been introduced in LP-DDR2, and is also present in DDR3. PASR provides 4 modes: * Single-Ended: Only 1/1, 1/2, 1/4 or 1/8 are refreshed, masking starting at the end of the DDR die. * Double-Ended: Same as Single-Ended, but refresh-masking does not start necessairly at the end of the DDR die. * Bank-Selective: Refresh of each bank of a die can be masked or unmasked via a dedicated DDR register (MR16). This mode is convenient for DDR configured in BRC (Bank-Row-Column) mode. * Segment-Selective: Refresh of each segment of a die can be masked or unmasked via a dedicated DDR register (MR17). This mode is convenient for DDR configured in RBC (Row-Bank-Column) mode. The role of this framework is to stop the refresh of unused memory to enhance DDR power consumption. It supports Bank-Selective and Segment-Selective modes, as the more adapted to modern OSes. At early boot stage, a representation of the physical DDR layout is built: Die 0 _______________________________ | I--------------------------I | | I Bank or Segment 0 I | | I--------------------------I | | I--------------------------I | | I Bank or Segment 1 I | | I--------------------------I | | I--------------------------I | | I Bank or Segment ... I | | I--------------------------I | | I--------------------------I | | I Bank or Segment n I | | I--------------------------I | |______________________________| ... Die n _______________________________ | I--------------------------I | | I Bank or Segment 0 I | | I--------------------------I | | I--------------------------I | | I Bank or Segment 1 I | | I--------------------------I | | I--------------------------I | | I Bank or Segment ... I | | I--------------------------I | | I--------------------------I | | I Bank or Segment n I | | I--------------------------I | |______________________________| The first level is a table where elements represent a die: * Base address, * Number of segments, * Table representing banks/segments, * MR16/MR17 refresh mask, * DDR Controller callback to update MR16/MR17 refresh mask. The second level is the section tables representing the banks or segments, depending on hardware configuration: * Base address, * Unused memory size counter, * Possible pointer to another section it depends on (E.g. Interleaving) When some memory becomes unused, the allocator owning this memory calls the PASR Framework's pasr_put(phys_addr, size) function. The framework finds the sections impacted and updates their counters accordingly. If a section counter reach the section size, the refresh of the section is masked. If the corresponding section has a dependency with another section (E.g. because of DDR interleaving, see figure below), it checks the "paired" section is also unused before updating the refresh mask. When some unused memory is requested by the allocator, the allocator owning this memory calls the PASR Framework's pasr_get(phys_addr, size) function. The framework find the section impacted and updates their counters accordingly. If before the update, the section counter was to the section size, the refrewh of the section is unmasked. If the corresponding section has a dependency with another section, it also unmask the refresh of the other section. Patch 3/6 contains modifications for the Buddy allocator. Overhead induced is very low because the PASR framework is notified only on "MAX_ORDER" pageblocs. Any allocator support(PMEM, HWMEM...) and Memory Hotplug would be added in next patch set revisions. Maxime Coquelin (6): PASR: Initialize DDR layout PASR: Add core Framework PASR: mm: Integrate PASR in Buddy allocator PASR: Call PASR initialization PASR: Add Documentation PASR: Ux500: Add PASR support Documentation/pasr.txt | 183 ++++++++++++ arch/arm/Kconfig | 1 + arch/arm/kernel/setup.c | 1 + arch/arm/mach-ux500/include/mach/hardware.h | 11 + arch/arm/mach-ux500/include/mach/memory.h | 8 + drivers/mfd/db8500-prcmu.c | 67 +++++ drivers/staging/Kconfig | 2 + drivers/staging/Makefile | 1 + drivers/staging/pasr/Kconfig | 19 ++ drivers/staging/pasr/Makefile | 6 + drivers/staging/pasr/core.c | 168 +++++++++++ drivers/staging/pasr/helper.c | 84 ++++++ drivers/staging/pasr/helper.h | 16 + drivers/staging/pasr/init.c | 403 +++++++++++++++++++++++++++ drivers/staging/pasr/ux500.c | 58 ++++ include/linux/pasr.h | 143 ++++++++++ include/linux/ux500-pasr.h | 11 + init/main.c | 8 + mm/page_alloc.c | 9 + 19 files changed, 1199 insertions(+), 0 deletions(-) create mode 100644 Documentation/pasr.txt create mode 100644 drivers/staging/pasr/Kconfig create mode 100644 drivers/staging/pasr/Makefile create mode 100644 drivers/staging/pasr/core.c create mode 100644 drivers/staging/pasr/helper.c create mode 100644 drivers/staging/pasr/helper.h create mode 100644 drivers/staging/pasr/init.c create mode 100644 drivers/staging/pasr/ux500.c create mode 100644 include/linux/pasr.h create mode 100644 include/linux/ux500-pasr.h -- 1.7.8

13 years, 10 months

5
19
0 0

[PATCH] dma-buf: add dma_data_direction to unmap dma_buf_op

by Sumit Semwal

Some exporters may use DMA map/unmap APIs in dma-buf ops, which require enum dma_data_direction for both map and unmap operations. Thus, the unmap dma_buf_op also needs to have enum dma_data_direction as a parameter. Reported-by: Tomasz Stanislawski <t.stanislaws(a)samsung.com> Signed-off-by: Sumit Semwal <sumit.semwal(a)ti.com> --- drivers/base/dma-buf.c | 7 +++++-- include/linux/dma-buf.h | 8 +++++--- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index 8afe2dd..c9a945f 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -271,16 +271,19 @@ EXPORT_SYMBOL_GPL(dma_buf_map_attachment); * dma_buf_ops. * @attach: [in] attachment to unmap buffer from * @sg_table: [in] scatterlist info of the buffer to unmap + * @direction: [in] direction of DMA transfer * */ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, - struct sg_table *sg_table) + struct sg_table *sg_table, + enum dma_data_direction direction) { if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) return; mutex_lock(&attach->dmabuf->lock); - attach->dmabuf->ops->unmap_dma_buf(attach, sg_table); + attach->dmabuf->ops->unmap_dma_buf(attach, sg_table, + direction); mutex_unlock(&attach->dmabuf->lock); } diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 86f6241..847b026 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -63,7 +63,8 @@ struct dma_buf_ops { struct sg_table * (*map_dma_buf)(struct dma_buf_attachment *, enum dma_data_direction); void (*unmap_dma_buf)(struct dma_buf_attachment *, - struct sg_table *); + struct sg_table *, + enum dma_data_direction); /* TODO: Add try_map_dma_buf version, to return immed with -EBUSY * if the call would block. */ @@ -122,7 +123,8 @@ void dma_buf_put(struct dma_buf *dmabuf); struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); -void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *); +void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *, + enum dma_data_direction); #else static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, @@ -166,7 +168,7 @@ static inline struct sg_table *dma_buf_map_attachment( } static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, - struct sg_table *sg) + struct sg_table *sg, enum dma_data_direction write) { return; } -- 1.7.5.4

13 years, 10 months

4
6
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig