- Linaro-mm-sig - lists.linaro.org

[PATCH] ARM: dma-mapping: fix buffer chunk allocation order

by Marek Szyprowski

IOMMU-aware dma_alloc_attrs() implementation allocates buffers in power-of-two chunks to improve performance and take advantage of large page mappings provided by some IOMMU hardware. However current code, due to a subtle bug, allocated those chunks in the smallest-to-largest order, what completely killed all the advantages of using larger than page chunks. If a 4KiB chunk has been mapped as a first chunk, the consecutive chunks are not aligned correctly to the power-of-two which match their size and IOMMU drivers were not able to use internal mappings of size other than the 4KiB (largest common denominator of alignment and chunk size). This patch fixes this issue by changing to the correct largest-to-smallest chunk size allocation sequence. Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com> --- arch/arm/mm/dma-mapping.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index d766e42..4044abc 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1067,7 +1067,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t return NULL; while (count) { - int j, order = __ffs(count); + int j, order = __fls(count); pages[i] = alloc_pages(gfp | __GFP_NOWARN, order); while (!pages[i] && order) -- 1.7.1.569.g6f426

13 years, 5 months

1
0
0 0

[RFC/PATCH] fb: Add dma-buf support

by Laurent Pinchart

Add support for the dma-buf exporter role to the frame buffer API. The importer role isn't meaningful for frame buffer devices, as the frame buffer device model doesn't allow using externally allocated memory. Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com> --- Documentation/fb/api.txt | 36 ++++++++++++++++++++++++++++++++++++ drivers/video/fbmem.c | 36 ++++++++++++++++++++++++++++++++++++ include/linux/fb.h | 12 ++++++++++++ 3 files changed, 84 insertions(+), 0 deletions(-) diff --git a/Documentation/fb/api.txt b/Documentation/fb/api.txt index d4ff7de..f0b2173 100644 --- a/Documentation/fb/api.txt +++ b/Documentation/fb/api.txt @@ -304,3 +304,39 @@ extensions. Upon successful format configuration, drivers update the fb_fix_screeninfo type, visual and line_length fields depending on the selected format. The type and visual fields are set to FB_TYPE_FOURCC and FB_VISUAL_FOURCC respectively. + + +5. DMA buffer sharing +--------------------- + +The dma-buf kernel framework allows DMA buffers to be shared across devices +and applications. Sharing buffers across display devices and video capture or +video decoding devices allow zero-copy operation when displaying video content +produced by a hardware device such as a camera or a hardware codec. This is +crucial to achieve optimal system performances during video display. + +While dma-buf supports both exporting internally allocated memory as a dma-buf +object (known as the exporter role) and importing a dma-buf object to be used +as device memory (known as the importer role), the frame buffer API only +supports the exporter role, as the frame buffer device model doesn't support +using externally-allocated memory. + +The export a frame buffer as a dma-buf file descriptors, applications call the +FBIOGET_DMABUF ioctl. The ioctl takes a pointer to a fb_dmabuf_export +structure. + +struct fb_dmabuf_export { + __u32 fd; + __u32 flags; +}; + +The flag field specifies the flags to be used when creating the dma-buf file +descriptor. The only supported flag is O_CLOEXEC. If the call is successful, +the driver will set the fd field to a file descriptor corresponding to the +dma-buf object. + +Applications can then pass the file descriptors to another application or +another device driver. The dma-buf object is automatically reference-counted, +applications can and should close the file descriptor as soon as they don't +need it anymore. The underlying dma-buf object will not be freed before the +last device that uses the dma-buf object releases it. diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index 0dff12a..400e449 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -15,6 +15,7 @@ #include <linux/compat.h> #include <linux/types.h> +#include <linux/dma-buf.h> #include <linux/errno.h> #include <linux/kernel.h> #include <linux/major.h> @@ -1074,6 +1075,23 @@ fb_blank(struct fb_info *info, int blank) return ret; } +#ifdef CONFIG_DMA_SHARED_BUFFER +int +fb_get_dmabuf(struct fb_info *info, int flags) +{ + struct dma_buf *dmabuf; + + if (info->fbops->fb_dmabuf_export == NULL) + return -ENOTTY; + + dmabuf = info->fbops->fb_dmabuf_export(info); + if (IS_ERR(dmabuf)) + return PTR_ERR(dmabuf); + + return dma_buf_fd(dmabuf, flags); +} +#endif + static long do_fb_ioctl(struct fb_info *info, unsigned int cmd, unsigned long arg) { @@ -1084,6 +1102,7 @@ static long do_fb_ioctl(struct fb_info *info, unsigned int cmd, struct fb_cmap cmap_from; struct fb_cmap_user cmap; struct fb_event event; + struct fb_dmabuf_export dmaexp; void __user *argp = (void __user *)arg; long ret = 0; @@ -1191,6 +1210,23 @@ static long do_fb_ioctl(struct fb_info *info, unsigned int cmd, console_unlock(); unlock_fb_info(info); break; +#ifdef CONFIG_DMA_SHARED_BUFFER + case FBIOGET_DMABUF: + if (copy_from_user(&dmaexp, argp, sizeof(dmaexp))) + return -EFAULT; + + if (!lock_fb_info(info)) + return -ENODEV; + dmaexp.fd = fb_get_dmabuf(info, dmaexp.flags); + unlock_fb_info(info); + + if (dmaexp.fd < 0) + return dmaexp.fd; + + ret = copy_to_user(argp, &dmaexp, sizeof(dmaexp)) + ? -EFAULT : 0; + break; +#endif default: if (!lock_fb_info(info)) return -ENODEV; diff --git a/include/linux/fb.h b/include/linux/fb.h index ac3f1c6..c9fee75 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -39,6 +39,7 @@ #define FBIOPUT_MODEINFO 0x4617 #define FBIOGET_DISPINFO 0x4618 #define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32) +#define FBIOGET_DMABUF _IOR('F', 0x21, struct fb_dmabuf_export) #define FB_TYPE_PACKED_PIXELS 0 /* Packed Pixels */ #define FB_TYPE_PLANES 1 /* Non interleaved planes */ @@ -403,6 +404,11 @@ struct fb_cursor { #define FB_BACKLIGHT_MAX 0xFF #endif +struct fb_dmabuf_export { + __u32 fd; + __u32 flags; +}; + #ifdef __KERNEL__ #include <linux/fs.h> @@ -418,6 +424,7 @@ struct vm_area_struct; struct fb_info; struct device; struct file; +struct dma_buf; /* Definitions below are used in the parsed monitor specs */ #define FB_DPMS_ACTIVE_OFF 1 @@ -701,6 +708,11 @@ struct fb_ops { /* called at KDB enter and leave time to prepare the console */ int (*fb_debug_enter)(struct fb_info *info); int (*fb_debug_leave)(struct fb_info *info); + +#ifdef CONFIG_DMA_SHARED_BUFFER + /* Export the frame buffer as a dmabuf object */ + struct dma_buf *(*fb_dmabuf_export)(struct fb_info *info); +#endif }; #ifdef CONFIG_FB_TILEBLITTING -- Regards, Laurent Pinchart

13 years, 5 months

3
4
0 0

[PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)

by Marek Szyprowski

Hello, This is a continuation of the dma-mapping extensions posted in the following thread: http://thread.gmane.org/gmane.linux.kernel.mm/78644 We noticed that some advanced buffer sharing use cases usually require creating a dma mapping for the same memory buffer for more than one device. Usually also such buffer is never touched with CPU, so the data are processed by the devices. >From the DMA-mapping perspective this requires to call one of the dma_map_{page,single,sg} function for the given memory buffer a few times, for each of the devices. Each dma_map_* call performs CPU cache synchronization, what might be a time consuming operation, especially when the buffers are large. We would like to avoid any useless and time consuming operations, so that was the main reason for introducing another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC, which lets dma-mapping core to skip CPU cache synchronization in certain cases. The proposed patches have been generated on top of the ARM DMA-mapping redesign patch series on Linux v3.4-rc7. They are also available on the following GIT branch: git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext with all require patches on top of vanilla v3.4-rc7 kernel. I will resend them rebased onto v3.5-rc1 soon. Best regards Marek Szyprowski Samsung Poland R&D Center Patch summary: Marek Szyprowski (2): common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute Documentation/DMA-attributes.txt | 24 ++++++++++++++++++++++++ arch/arm/mm/dma-mapping.c | 20 +++++++++++--------- include/linux/dma-attrs.h | 1 + 3 files changed, 36 insertions(+), 9 deletions(-) -- 1.7.1.569.g6f426

13 years, 6 months

3
5
0 0

[PATCH v5 0/4] Add CMA heap for ION memory manager

by benjamin.gaignard＠stericsson.com

From: Benjamin Gaignard <benjamin.gaignard(a)linaro.org> The goal of those patches is to allow ION clients (drivers or userland applications) to use Contiguous Memory Allocator (CMA). To get more info about CMA: http://lists.linaro.org/pipermail/linaro-mm-sig/2012-February/001328.html patches version 5: - port patches on android kernel 3.4 where ION use dmabuf - add ion_cma_heap_map_dma and ion_cma_heap_unmap_dma functions patches version 4: - add ION_HEAP_TYPE_DMA heap type in ion_heap_type enum. - CMA heap is now a "native" ION heap. - add ion_heap_create_full function to keep backward compatibilty. - clean up included files in CMA heap - ux500-ion is using ion_heap_create_full instead of ion_heap_create patches version 3: - add a private field in ion_heap structure instead of expose ion_device structure to all heaps - ion_cma_heap is no more a platform driver - ion_cma_heap use ion_heap private field to store the device pointer and make the link with reserved CMA regions - provide ux500-ion driver and configuration file for snowball board to give an example of how use CMA heaps patches version 2: - fix comments done by Andy Green Benjamin Gaignard (4): fix ion_platform_data definition add private field in ion_heap structure add CMA heap add test/example driver for ux500 platform arch/arm/mach-ux500/board-mop500.c | 77 ++++++++++++++++ drivers/gpu/ion/Kconfig | 5 ++ drivers/gpu/ion/Makefile | 5 +- drivers/gpu/ion/ion_cma_heap.c | 175 ++++++++++++++++++++++++++++++++++++ drivers/gpu/ion/ion_heap.c | 18 +++- drivers/gpu/ion/ion_priv.h | 13 +++ drivers/gpu/ion/ux500/Makefile | 1 + drivers/gpu/ion/ux500/ux500_ion.c | 142 +++++++++++++++++++++++++++++ include/linux/ion.h | 5 +- 9 files changed, 438 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/ion/ion_cma_heap.c create mode 100644 drivers/gpu/ion/ux500/Makefile create mode 100644 drivers/gpu/ion/ux500/ux500_ion.c -- 1.7.10

13 years, 6 months

5
11
0 0

[GIT PULL] DMA-mapping fixes for v3.5-rc3

by Marek Szyprowski

Hi Linus, I would like to ask for pulling a set of minor fixes for dma-mapping code (ARM and x86) required for Contiguous Memory Allocator (CMA) patches merged in v3.5-rc1. The following changes since commit cfaf025112d3856637ff34a767ef785ef5cf2ca9: Linux 3.5-rc2 (2012-06-08 18:40:09 -0700) with the top-most commit c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae x86: dma-mapping: fix broken allocation when dma_mask has been provided are available in the git repository at: git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git fixes-for-linus Marek Szyprowski (3): ARM: mm: fix type of the arm_dma_limit global variable ARM: dma-mapping: fix debug messages in dmabounce code x86: dma-mapping: fix broken allocation when dma_mask has been provided Sachin Kamat (1): ARM: dma-mapping: Add missing static storage class specifier arch/arm/common/dmabounce.c | 16 ++++++++-------- arch/arm/mm/dma-mapping.c | 4 ++-- arch/arm/mm/init.c | 2 +- arch/arm/mm/mm.h | 2 +- arch/x86/kernel/pci-dma.c | 3 ++- 5 files changed, 14 insertions(+), 13 deletions(-) Thanks! Best regards Marek Szyprowski Samsung Poland R&D Center

13 years, 6 months

1
0
0 0

[PATCHv2 0/6] ARM: DMA-mapping: new extensions for buffer sharing

by Marek Szyprowski

Hello, This is an updated version of the patch series introducing a new features to DMA mapping subsystem to let drivers share the allocated buffers (preferably using recently introduced dma_buf framework) easy and efficient. The first extension is DMA_ATTR_NO_KERNEL_MAPPING attribute. It is intended for use with dma_{alloc, mmap, free}_attrs functions. It can be used to notify dma-mapping core that the driver will not use kernel mapping for the allocated buffer at all, so the core can skip creating it. This saves precious kernel virtual address space. Such buffer can be accessed from userspace, after calling dma_mmap_attrs() for it (a typical use case for multimedia buffers). The value returned by dma_alloc_attrs() with this attribute should be considered as a DMA cookie, which needs to be passed to dma_mmap_attrs() and dma_free_attrs() funtions. The second extension is required to let drivers to share the buffers allocated by DMA-mapping subsystem. Right now the driver gets a dma address of the allocated buffer and the kernel virtual mapping for it. If it wants to share it with other device (= map into its dma address space) it usually hacks around kernel virtual addresses to get pointers to pages or assumes that both devices share the DMA address space. Both solutions are just hacks for the special cases, which should be avoided in the final version of buffer sharing. To solve this issue in a generic way, a new call to DMA mapping has been introduced - dma_get_sgtable(). It allocates a scatter-list which describes the allocated buffer and lets the driver(s) to use it with other device(s) by calling dma_map_sg() on it. The third extension solves the performance issues which we observed with some advanced buffer sharing use cases, which require creating a dma mapping for the same memory buffer for more than one device. From the DMA-mapping perspective this requires to call one of the dma_map_{page,single,sg} function for the given memory buffer a few times, for each of the devices. Each dma_map_* call performs CPU cache synchronization, what might be a time consuming operation, especially when the buffers are large. We would like to avoid any useless and time consuming operations, so that was the main reason for introducing another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC, which lets dma-mapping core to skip CPU cache synchronization in certain cases. The proposed patches have been rebased on the latest Linux kernel v3.5-rc2 with 'ARM: replace custom consistent dma region with vmalloc' patches applied (for more information, please refer to the http://www.spinics.net/lists/arm-kernel/msg179202.html thread). The patches together with all dependences are also available on the following GIT branch: git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.5-rc2-dma-ext-v2 Best regards Marek Szyprowski Samsung Poland R&D Center Changelog: v2: - rebased onto v3.5-rc2 and adapted for CMA and dma-mapping changes - renamed dma_get_sgtable() to dma_get_sgtable_attrs() to match the convention of the other dma-mapping calls with attributes - added generic fallback function for dma_get_sgtable() for architectures with simple dma-mapping implementations v1: http://thread.gmane.org/gmane.linux.kernel.mm/78644 http://thread.gmane.org/gmane.linux.kernel.cross-arch/14435 (part 2) - initial version Patch summary: Marek Szyprowski (6): common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING attribute common: dma-mapping: introduce dma_get_sgtable() function ARM: dma-mapping: add support for dma_get_sgtable() common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute Documentation/DMA-attributes.txt | 42 ++++++++++++++++++ arch/arm/common/dmabounce.c | 1 + arch/arm/include/asm/dma-mapping.h | 3 + arch/arm/mm/dma-mapping.c | 69 ++++++++++++++++++++++++------ drivers/base/dma-mapping.c | 18 ++++++++ include/asm-generic/dma-mapping-common.h | 18 ++++++++ include/linux/dma-attrs.h | 2 + include/linux/dma-mapping.h | 3 + 8 files changed, 142 insertions(+), 14 deletions(-) -- 1.7.1.569.g6f426

13 years, 6 months

3
14
0 0

[PATCH] ARM: dma-mapping: fix debug messages in dmabounce code

by Marek Szyprowski

This patch fixes the usage of uninitialized variables in dmabounce code intoduced by commit a227fb92 ('ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops'): arch/arm/common/dmabounce.c: In function ‘dmabounce_sync_for_device’: arch/arm/common/dmabounce.c:409: warning: ‘off’ may be used uninitialized in this function arch/arm/common/dmabounce.c:407: note: ‘off’ was declared here arch/arm/common/dmabounce.c: In function ‘dmabounce_sync_for_cpu’: arch/arm/common/dmabounce.c:369: warning: ‘off’ may be used uninitialized in this function arch/arm/common/dmabounce.c:367: note: ‘off’ was declared here Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com> --- arch/arm/common/dmabounce.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index 9d7eb53..aa07f59 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -366,8 +366,8 @@ static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, struct safe_buffer *buf; unsigned long off; - dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", - __func__, addr, off, sz, dir); + dev_dbg(dev, "%s(dma=%#x,sz=%zx,dir=%x)\n", + __func__, addr, sz, dir); buf = find_safe_buffer_dev(dev, addr, __func__); if (!buf) @@ -377,8 +377,8 @@ static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, BUG_ON(buf->direction != dir); - dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", - __func__, buf->ptr, virt_to_dma(dev, buf->ptr), + dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x off=%#lx) mapped to %p (dma=%#x)\n", + __func__, buf->ptr, virt_to_dma(dev, buf->ptr), off, buf->safe, buf->safe_dma_addr); DO_STATS(dev->archdata.dmabounce->bounce_count++); @@ -406,8 +406,8 @@ static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, struct safe_buffer *buf; unsigned long off; - dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", - __func__, addr, off, sz, dir); + dev_dbg(dev, "%s(dma=%#x,sz=%zx,dir=%x)\n", + __func__, addr, sz, dir); buf = find_safe_buffer_dev(dev, addr, __func__); if (!buf) @@ -417,8 +417,8 @@ static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, BUG_ON(buf->direction != dir); - dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", - __func__, buf->ptr, virt_to_dma(dev, buf->ptr), + dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x off=%#lx) mapped to %p (dma=%#x)\n", + __func__, buf->ptr, virt_to_dma(dev, buf->ptr), off, buf->safe, buf->safe_dma_addr); DO_STATS(dev->archdata.dmabounce->bounce_count++); -- 1.7.1.569.g6f426

13 years, 6 months

1
0
0 0

[PATCH][RFC] mm: Don't put CMA pages on per-cpu lists

by Laura Abbott

Currently, when freeing 0 order pages, CMA pages are treated the same as regular movable pages, which means they end up on the per-cpu page list. This means that the CMA pages are likely to be allocated for something other than contigous memory. This increases the chance that the next alloc_contig_range will fail because pages can't be migrated. Given the size of the CMA region is typically limited, it is best to optimize for success of alloc_contig_range as much as possible. Do this by freeing CMA pages directly instead of putting them on the per-cpu page lists. Signed-off-by: Laura Abbott <lauraa(a)codeaurora.org> --- mm/page_alloc.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0e1c6f5..c9a6483 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1310,7 +1310,8 @@ void free_hot_cold_page(struct page *page, int cold) * excessively into the page allocator */ if (migratetype >= MIGRATE_PCPTYPES) { - if (unlikely(migratetype == MIGRATE_ISOLATE)) { + if (unlikely(migratetype == MIGRATE_ISOLATE) + || is_migrate_cma(migratetype)) { free_one_page(zone, page, 0, migratetype); goto out; } -- 1.7.8.3

13 years, 6 months

2
2
0 0

[PATCH 00/12] Support for dmabuf exporting for videobuf2

by Tomasz Stanislawski

Hello everyone, The patches adds support for DMABUF exporting to V4L2 stack. The latest support for DMABUF importing was posted in [1]. The exporter part is dependant on DMA mapping redesign [2] which is not merged into the mainline. Therefore it is posted as a separate patchset. Moreover some patches depends on vmap extension for DMABUF by Dave Airlie [3] and sg_alloc_table_from_pages function [4]. Changelog: v0: (RFC) - updated setup of VIDIOC_EXPBUF ioctl - doc updates - introduced workaround to avoid using dma_get_pages, - removed caching of exported dmabuf to avoid existence of circular reference between dmabuf and vb2_dc_buf or resource leakage - removed all 'change behaviour' patches - inital support for exporting in s5p-mfs driver - removal of vb2_mmap_pfn_range that is no longer used - use sg_alloc_table_from_pages instead of creating sglist in vb2_dc code - move attachment allocation to exporter's attach callback [1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/48730 [2] http://thread.gmane.org/gmane.linux.kernel.cross-arch/14098 [3] http://permalink.gmane.org/gmane.comp.video.dri.devel/69302 [4] This patchset is rebased on 3.4-rc1 plus the following patchsets: Marek Szyprowski (1): v4l: vb2-dma-contig: let mmap method to use dma_mmap_coherent call Tomasz Stanislawski (11): v4l: add buffer exporting via dmabuf v4l: vb2: add buffer exporting via dmabuf v4l: vb2-dma-contig: add setup of sglist for MMAP buffers v4l: vb2-dma-contig: add support for DMABUF exporting v4l: vb2-dma-contig: add vmap/kmap for dmabuf exporting v4l: s5p-fimc: support for dmabuf exporting v4l: s5p-tv: mixer: support for dmabuf exporting v4l: s5p-mfc: support for dmabuf exporting v4l: vb2: remove vb2_mmap_pfn_range function v4l: vb2-dma-contig: use sg_alloc_table_from_pages function v4l: vb2-dma-contig: Move allocation of dbuf attachment to attach cb drivers/media/video/s5p-fimc/fimc-capture.c | 9 + drivers/media/video/s5p-mfc/s5p_mfc_dec.c | 13 ++ drivers/media/video/s5p-mfc/s5p_mfc_enc.c | 13 ++ drivers/media/video/s5p-tv/mixer_video.c | 10 + drivers/media/video/v4l2-compat-ioctl32.c | 1 + drivers/media/video/v4l2-dev.c | 1 + drivers/media/video/v4l2-ioctl.c | 6 + drivers/media/video/videobuf2-core.c | 67 ++++++ drivers/media/video/videobuf2-dma-contig.c | 323 ++++++++++++++++++++++----- drivers/media/video/videobuf2-memops.c | 40 ---- include/linux/videodev2.h | 26 +++ include/media/v4l2-ioctl.h | 2 + include/media/videobuf2-core.h | 2 + include/media/videobuf2-memops.h | 5 - 14 files changed, 411 insertions(+), 107 deletions(-) -- 1.7.9.5

13 years, 6 months

4
24
0 0

Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

by Erik Gilling

On Thu, Jun 7, 2012 at 4:35 AM, Tom Cooksey <tom.cooksey(a)arm.com> wrote: > The alternate is to not associate sync objects with buffers and > have them be distinct entities, exposed to userspace. This gives > userpsace more power and flexibility and might allow for use-cases > which an implicit synchronization mechanism can't satisfy - I'd > be curious to know any specifics here. Time and time again we've had problems with implicit synchronization resulting in bugs where different drivers play by slightly different implicit rules. We're convinced the best way to attack this problem is to move as much of the command and control of synchronization as possible into a single piece of code (the compositor in our case.) To facilitate this we're going to be mandating this explicit approach in the K release of Android. > However, every driver which > needs to participate in the synchronization mechanism will need > to have its interface with userspace modified to allow the sync > objects to be passed to the drivers. This seemed like a lot of > work to me, which is why I prefer the implicit approach. However > I don't actually know what work is needed and think it should be > explored. I.e. How much work is it to add explicit sync object > support to the DRM & v4l2 interfaces? > > E.g. I believe DRM/GEM's job dispatch API is "in-order" > in which case it might be easy to just add "wait for this fence" > and "signal this fence" ioctls. Seems like vmwgfx already has > something similar to this already? Could this work over having > to specify a list of sync objects to wait on and another list > of sync objects to signal for every operation (exec buf/page > flip)? What about for v4l2? If I understand you right a job submission with explicit sync would become 3 submission: 1) submit wait for pre-req fence job 2) submit render job 3) submit signal ready fence job Does DRM provide a way to ensure these 3 jobs are submitted atomically? I also expect GPU vendor would like to get clever about GPU to GPU fence dependancies. That could probably be handled entirely in the userspace GL driver. > I guess my other thought is that implicit vs explicit is not > mutually exclusive, though I'd guess there'd be interesting > deadlocks to have to debug if both were in use _at the same > time_. :-) I think this is an approach worth investigating. I'd like a way to either opt out of implicit sync or have a way to check if a dma-buf has an attached fence and detach it. Actually, that could work really well. Consider: * Each dma_buf has a single fence "slot" * on submission * the driver will extract the fence from the dma_buf and queue a wait on it. * the driver will replace that fence with it's own complettion fence before the job submission ioctl returns. * dma_buf will have two userspace ioctls: * DETACH: will return the fence as an FD to userspace and clear the fence slot in the dma_buf * ATTACH: takes a fence FD from userspace and attaches it to the dma_buf fence slot. Returns an error if the fence slot is non-empty. In the android case, we can do a detach after every submission and an attach right before. -Erik

13 years, 6 months

3
4
0 0

Re: [Linaro-mm-sig] Synchronization framework

by Maarten Lankhorst

Hey Erik, Op 07-06-12 19:35, Erik Gilling schreef: > On Thu, Jun 7, 2012 at 1:55 AM, Maarten Lankhorst > <m.b.lankhorst(a)gmail.com> wrote: >> I haven't looked at intel and amd, but from a quick glance >> it seems like they already implement fencing too, so just >> some way to synch up the fences on shared buffers seems >> like it could benefit all graphics drivers and the whole >> userspace synching could be done away with entirely. > It's important to have some level of userspace API so that GPU > generated graphics can participate in the graphics pipeline. Think of > the case where you have a software video codec streaming textures into > the GPU. It needs to know when the GPU is done with those textures so > it can reuse the buffer. > In the graphics case this problem already has to be handled without dma-buf, so adding any extra synchronization api for userspace that is only used when the bo is shared is a waste. I do agree you need some way to synch userspace though, but I think adding a new api for userspace is not the way to go. Cheers, Maarten PS: re-added cc's that seem to have fallen off from your mail.

13 years, 6 months

2
1
0 0

Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

by Erik Gilling

Tom, Is there more planned for KDS? It seems to be lacking some important features to be useful across many SoCs and graphics cards and features needed by Android. Here's some general feedback on those gaps. There is no way to share information between a buffer provider and a buffer consumer. This is important for architectures such as Tegra which have several hardware blocks that share common hardware synchronization. There's no userspace API. There are several reasons this is necessary. First, some userspace code (such as GL libs) might need to get at the private data of the sync primitive in order to generate command lists for a piece of hardware. Second is does not let userspace have control or even visibility into the graphics pipeline. The direction we are moving in Android is to put more control over synchronization into the compositor and move it out of being implemented "behind the scenes" by every vendor. Third, there's no way for a userspace process to wait on a sync primitive. There's no debugging or timing information tracked with the sync primitives. During development on new platforms and new OS versions we often have cases where the graphics pipeline stops making forward progress because one of the pieces (GPU, display, camera, dsp, userspace) has, itself, stopped making forward progress. Finding the root cause of the often hard to reproduce cases is difficult when you have to instrument every single driver. It's unclear how you would attach a dependency on a EGL fence to a dma_buf. Maybe this would be an EGL extension where you pass in the fence and the dma_buf. At Android we've been working on our own approach to this problem. I'll post those patches for discussion. Cheers, Erik

13 years, 6 months

3
6
0 0

[PATCHv6 00/13] Integration of videobuf2 with dmabuf

by Tomasz Stanislawski

Hello everyone, This patchset adds support for DMABUF [2] importing to V4L2 stack. The support for DMABUF exporting was moved to separate patchset due to dependency on patches for DMA mapping redesign by Marek Szyprowski [4]. v6: - fixed missing entry in v4l2_memory_names - fixed a bug occuring after get_user_pages failure - fixed a bug caused by using invalid vma for get_user_pages - prepare/finish no longer call dma_sync for dmabuf buffers v5: - removed change of importer/exporter behaviour - fixes vb2_dc_pages_to_sgt basing on Laurent's hints - changed pin/unpin words to lock/unlock in Doc v4: - rebased on mainline 3.4-rc2 - included missing importing support for s5p-fimc and s5p-tv - added patch for changing map/unmap for importers - fixes to Documentation part - coding style fixes - pairing {map/unmap}_dmabuf in vb2-core - fixing variable types and semantic of arguments in videobufb2-dma-contig.c v3: - rebased on mainline 3.4-rc1 - split 'code refactor' patch to multiple smaller patches - squashed fixes to Sumit's patches - patchset is no longer dependant on 'DMA mapping redesign' - separated path for handling IO and non-IO mappings - add documentation for DMABUF importing to V4L - removed all DMABUF exporter related code - removed usage of dma_get_pages extension v2: - extended VIDIOC_EXPBUF argument from integer memoffset to struct v4l2_exportbuffer - added patch that breaks DMABUF spec on (un)map_atachment callcacks but allows to work with existing implementation of DMABUF prime in DRM - all dma-contig code refactoring patches were squashed - bugfixes v1: List of changes since [1]. - support for DMA api extension dma_get_pages, the function is used to retrieve pages used to create DMA mapping. - small fixes/code cleanup to videobuf2 - added prepare and finish callbacks to vb2 allocators, it is used keep consistency between dma-cpu acess to the memory (by Marek Szyprowski) - support for exporting of DMABUF buffer in V4L2 and Videobuf2, originated from [3]. - support for dma-buf exporting in vb2-dma-contig allocator - support for DMABUF for s5p-tv and s5p-fimc (capture interface) drivers, originated from [3] - changed handling for userptr buffers (by Marek Szyprowski, Andrzej Pietrasiewicz) - let mmap method to use dma_mmap_writecombine call (by Marek Szyprowski) [1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/4296… [2] https://lkml.org/lkml/2011/12/26/29 [3] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/3635… [4] http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819 Laurent Pinchart (2): v4l: vb2-dma-contig: Shorten vb2_dma_contig prefix to vb2_dc v4l: vb2-dma-contig: Reorder functions Marek Szyprowski (2): v4l: vb2: add prepare/finish callbacks to allocators v4l: vb2-dma-contig: add prepare/finish to dma-contig allocator Sumit Semwal (4): v4l: Add DMABUF as a memory type v4l: vb2: add support for shared buffer (dma_buf) v4l: vb: remove warnings about MEMORY_DMABUF v4l: vb2-dma-contig: add support for dma_buf importing Tomasz Stanislawski (5): Documentation: media: description of DMABUF importing in V4L2 v4l: vb2-dma-contig: Remove unneeded allocation context structure v4l: vb2-dma-contig: add support for scatterlist in userptr mode v4l: s5p-tv: mixer: support for dmabuf importing v4l: s5p-fimc: support for dmabuf importing Documentation/DocBook/media/v4l/compat.xml | 4 + Documentation/DocBook/media/v4l/io.xml | 179 +++++++ .../DocBook/media/v4l/vidioc-create-bufs.xml | 1 + Documentation/DocBook/media/v4l/vidioc-qbuf.xml | 15 + Documentation/DocBook/media/v4l/vidioc-reqbufs.xml | 45 +- drivers/media/video/s5p-fimc/Kconfig | 1 + drivers/media/video/s5p-fimc/fimc-capture.c | 2 +- drivers/media/video/s5p-tv/Kconfig | 1 + drivers/media/video/s5p-tv/mixer_video.c | 2 +- drivers/media/video/v4l2-ioctl.c | 1 + drivers/media/video/videobuf-core.c | 4 + drivers/media/video/videobuf2-core.c | 207 +++++++- drivers/media/video/videobuf2-dma-contig.c | 520 +++++++++++++++++--- include/linux/videodev2.h | 7 + include/media/videobuf2-core.h | 34 ++ 15 files changed, 924 insertions(+), 99 deletions(-) -- 1.7.9.5

13 years, 6 months

8
30
0 0

Re: [Linaro-mm-sig] Synchronization framework

by Maarten Lankhorst

Hey, For intel/nouveau hybrid graphics I'm interested in this since it would allow me to synchronize between intel and nvidia cards without waiting for rendering to complete. I'm worried about the api though, nouveau and intel already have existing infrastructure to deal with fencing so exposing additional ioctl's will complicate the implementation. Would it be possible to never expose this interface to userspace but keep it inside the kernel only? nouveau_gem_ioctl_pushbuf is what's used for nouveau. If any dmabuf synch framework could hook into that then userspace would never have to act differently on shared bo's. I haven't looked at intel and amd, but from a quick glance it seems like they already implement fencing too, so just some way to synch up the fences on shared buffers seems like it could benefit all graphics drivers and the whole userspace synching could be done away with entirely. Cheers, Maarten

13 years, 6 months

1
0
0 0

Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

by Erik Gilling

On Wed, Jun 6, 2012 at 6:33 AM, John Reitan <john.reitan(a)arm.com> wrote: >> But maybe instead of inventing something new, we can just use 'struct >> kthread_work' instead of 'struct kds_callback' plus the two 'void *'s? >> If the user needs some extra args they can embed 'struct >> kthread_work' in their own struct and use container_of() magic in the >> cb. >> >> Plus this is a natural fit if you want to dispatch callbacks instead >> on a kthread_worker, which seems like it would simplify a few things >> when it comes to deadlock avoidance.. ie., not resource deadlock >> avoidance, but dispatching callbacks when some lock is held. > > That sounds like a better approach. > Will make a cleaner API, will look into it. When Tom visited us for android graphics camp in the fall he argued that there were cases where we would want to avoid an extra schedule. Consider the case where the GPU is waiting for a render buffer that the display controller is using. If that render can be kicked off w/o acquiring locks, the display's vsync IRQ handler can call release, which in turn calls the GPU callback, which in turn kicks off the render very quickly w/o having to leave IRQ context. One way around the locking issue with callbacks/async wait is to have async wait return a value to indicate that the resource has been acquired instead of calling the callback. This is the approach I chose in our sync framework. -Erik

13 years, 6 months

1
0
0 0

Re: [Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

by Rob Clark

Some comments inline.. at this stage mostly superficial issues about how the API works, etc.. not had a chance to dig too much into the implementation yet (although some of my comments about the API would change those anyways). Anyways, thanks for getting the ball rolling on this, and I think I can volunteer linaro to pick up and run w/ this if needed. On Fri, May 25, 2012 at 7:08 PM, Tom Cooksey <tom.cooksey(a)arm.com> wrote: > Hi All, > > I realise it's been a while since this was last discussed, however I'd like > to bring up kernel-side synchronization again. By kernel-side > synchronization, I mean allowing multiple drivers/devices wanting to access > the same buffer to do so without bouncing up to userspace to resolve > dependencies such as "the display controller can't start scanning out a > buffer until the GPU has finished rendering into it". As such, this is > really just an optimization which reduces latency between E.g. The GPU > finishing a rendering job and that buffer being scanned out. I appreciate > this particular example is already solved on desktop graphics cards as the > display controller and 3D core are both controlled by the same driver, so no > "generic" mechanism is needed. However on ARM SoCs, the 3D core (like an ARM > Mali) and display controller tend to be driven by separate drivers, so some > mechanism is needed to allow both drivers to synchronize their access to > buffers. > > There are multiple ways synchronization can be achieved, fences/sync objects > is one common approach, however we're presenting a different approach. > Personally, I quite like fence sync objects, however we believe it requires > a lot of userspace interfaces to be changed to pass around sync object > handles. Our hope is that the kds approach will require less effort to make > use of as no existing userspace interfaces need to be changed. E.g. To use > explicit fences, the struct drm_mode_crtc_page_flip would need a new members > to pass in the handle(s) of sync object(s) which the flip depends on (I.e. > don't flip until these fences fire). The additional benefit of our approach > is that it prevents userspace specifying dependency loops which can cause a > deadlock (see kds.txt for an explanation of what I mean here). > > I have waited until now to bring this up again because I am now able to > share the code I was trying (and failing I think) to explain previously. The > code has now been released under the GPLv2 from ARM Mali's developer portal, > however I've attempted to turn that into a patch to allow it to be discussed > on this list. Please find the patch inline below. > > While KDS defines a very generic mechanism, I am proposing that this code or > at least the concepts be merged with the existing dma_buf code, so a the > struct kds_resource members get moved to struct dma_buf, kds_* functions get > renamed to dma_buf_* functions, etc. So I guess what I'm saying is please > don't review the actual code just yet, only the concepts the code describes, > where kds_resource == dma_duf. > > > Cheers, > > Tom > > > > Author: Tom Cooksey <tom.cooksey(a)arm.com> > Date: Fri May 25 10:45:27 2012 +0100 > > Add new system to allow synchronizing access to resources > > See Documentation/kds.txt for details, however the general > idea is that this kds framework synchronizes multiple drivers > ("clients") wanting to access the same resources, where a > resource is typically a 2D image buffer being shared around > using dma-buf. > > Note: This patch is created by extracting the sources from the > tarball on <http://www.malideveloper.com/open-source-mali-gpus-lin > ux-kernel-device-drivers---dev-releases.php> and putting them in > roughly the right places. > > diff --git a/Documentation/kds.txt b/Documentation/kds.txt fwiw, I think the documentation could be made a bit more generic, but this and code style, etc shouldn't be too hard to fix > new file mode 100644 > index 0000000..a96db21 > --- /dev/null > +++ b/Documentation/kds.txt > @@ -0,0 +1,113 @@ > +# > +# (C) COPYRIGHT 2012 ARM Limited. All rights reserved. > +# > +# This program is free software and is provided to you under the terms of > the GNU General Public License version 2 > +# as published by the Free Software Foundation, and any use by you of this > program is subject to the terms of such GNU licence. > +# > +# A copy of the licence is included with the program, and can also be > obtained from Free Software > +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > 02110-1301, USA. > +# > +# > + > + > +============================== > +kds - Kernel Dependency System > +============================== > + > +Introduction > +------------ > +kds provides a mechanism for clients to atomically lock down multiple > abstract resources. > +This can be done either synchronously or asynchronously. > +Abstract resources is used to allow a set of clients to use kds to control > access to any > +resource, an example is structured memory buffers. > + > +kds supports that buffer is locked for exclusive access and sharing of > buffers. > + > +kds can be built as either a integrated feature of the kernel or as a > module. > +It supports being compiled as a module both in-tree and out-of-tree. > + > + > +Concepts > +-------- > +A core concept in kds is abstract resources. > +A kds resource is just an abstraction for some client object, kds doesn't > care what it is. > +Typically EGL will consider UMP buffers as being a resource, thus each UMP > buffer has > +a kds resource for synchronization to the buffer. > + > +kds allows a client to create and destroy the abstract resource objects. > +A new resource object is made available asap (it is just a simple malloc > with some initializations), > +while destroy it requires some external synchronization. > + > +The other core concept in kds is consumer of resources. > +kds is requested to allow a client to consume a set of resources and the > client will be notified when it can consume the resources. > + > +Exclusive access allows only one client to consume a resource. > +Shared access permits multiple consumers to acceess a resource > concurrently. > + > + > +APIs > +---- > +kds provides simple resource allocate and destroy functions. > +Clients use this to instantiate and control the lifetime of the resources > kds manages. > + > +kds provides two ways to wait for resources: > +- Asynchronous wait: the client specifies a function pointer to be called > when wait is over > +- Synchronous wait: Function blocks until access is gained. > + > +The synchronous API has a timeout for the wait. > +The call can early out if a signal is delivered. > + > +After a client is done consuming the resource kds must be notified to > release the resources and let some other client take ownership. > +This is done via resource set release call. > + > +A Windows comparison: > +kds implements WaitForMultipleObjectsEx(..., bWaitAll = TRUE, ...) but also > has an asynchronous version in addition. > +kds resources can be seen as being the same as NT object manager resources. > + > +Internals > +--------- > +kds guarantees atomicity when a set of resources is operated on. > +This is implemented via a global resource lock which is taken by kds when > it updates resource objects. > + > +Internally a resource in kds is a linked list head with some flags. > + > +When a consumer requests access to a set of resources it is queued on each > of the resources. > +The link from the consumer to the resources can be triggered. Once all > links are triggered > +the registered callback is called or the blocking function returns. > +A link is considered triggered if it is the first on the list of consumers > of a resource, > +or if all the links ahead of it is marked as shared and itself is of the > type shared. > + > +When the client is done consuming the consumer object is removed from the > linked lists of > +the resources and a potential new consumer becomes the head of the > resources. > +As we add and remove consumers atomically across all resources we can > guarantee that > +we never introduces a A->B + B->A type of loops/deadlocks. > + > + > +kbase/base implementation > +------------------------- > +A HW job needs access to a set of shared resources. > +EGL tracks this and encodes the set along with the atom in the ringbuffer. > +EGL allocates a (k)base dep object to represent the dependency to the set > of resources and encodes that along with the list of resources. > +This dep object is use to create a dependency from a job chain(atom) to the > resources it needs to run. > +When kbase decodes the atom in the ringbuffer it finds the set of resources > and calls kds to request all the needed resources. > +As EGL needs to know when the kds request is delivered a new base event > object is needed: atom enqueued. This event is only delivered for atoms > which uses kds. > +The callback kbase registers trigger the dependency object described which > would trigger the existing JD system to release the job chain. > +When the atom is done kds resource set release is call to release the > resources. > + > +EGL will typically use exclusive access to the render target, while all > buffers used as input can be marked as shared. > + > + > +Buffer publish/vsync > +-------------------- > +EGL will use a separate ioctl or DRM flip to request the flip. > +If the LCD driver is integrated with kds EGL can do these operations early. > +The LCD driver must then implement the ioctl or DRM flip to be asynchronous > with kds async call. > +The LCD driver binds a kds resource to each virtual buffer (2 buffers in > case of double-buffering). > +EGL will make a dependency to the target kds resource in the kbase atom. > +After EGL receives a atom enqueued event it can ask the LCD driver to pan > to the target kds resource. > +When the atom is completed it'll release the resource and the LCD driver > will get its callback. > +In the callback it'll load the target buffer into the DMA unit of the LCD > hardware. > +The LCD driver will be the consumer of both buffers for a short period. > +The LCD driver will call kds resource set release on the previous on-screen > buffer when the next vsync/dma read end is handled. > + > + > diff --git a/drivers/misc/kds.c b/drivers/misc/kds.c > new file mode 100644 > index 0000000..8d7d55e > --- /dev/null > +++ b/drivers/misc/kds.c > @@ -0,0 +1,461 @@ > +/* > + * > + * (C) COPYRIGHT 2012 ARM Limited. All rights reserved. > + * > + * This program is free software and is provided to you under the terms of > the GNU General Public License version 2 > + * as published by the Free Software Foundation, and any use by you of this > program is subject to the terms of such GNU licence. > + * > + * A copy of the licence is included with the program, and can also be > obtained from Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > 02110-1301, USA. > + * > + */ > + > + > + > +#include <linux/slab.h> > +#include <linux/list.h> > +#include <linux/mutex.h> > +#include <linux/wait.h> > +#include <linux/sched.h> > +#include <linux/err.h> > +#include <linux/module.h> > +#include <linux/workqueue.h> > +#include <linux/kds.h> > + > + > +#define KDS_LINK_TRIGGERED (1u << 0) > +#define KDS_LINK_EXCLUSIVE (1u << 1) > + > +#define KDS_IGNORED NULL > +#define KDS_INVALID (void*)-2 > +#define KDS_RESOURCE (void*)-1 > + > +struct kds_resource_set > +{ > + unsigned long num_resources; > + unsigned long pending; > + unsigned long locked_resources; > + struct kds_callback * cb; > + void * callback_parameter; > + void * callback_extra_parameter; > + struct list_head callback_link; > + struct work_struct callback_work; > + struct kds_link resources[0]; > +}; > + > +static DEFINE_MUTEX(kds_lock); > + > +int kds_callback_init(struct kds_callback * cb, int direct, kds_callback_fn > user_cb) > +{ > + int ret = 0; > + > + cb->direct = direct; > + cb->user_cb = user_cb; > + > + if (!direct) > + { > + cb->wq = alloc_workqueue("kds", WQ_UNBOUND | WQ_HIGHPRI, > WQ_UNBOUND_MAX_ACTIVE); > + if (!cb->wq) > + ret = -ENOMEM; > + } > + else > + { > + cb->wq = NULL; > + } > + > + return ret; > +} > +EXPORT_SYMBOL(kds_callback_init); > + > +void kds_callback_term(struct kds_callback * cb) > +{ > + if (!cb->direct) > + { > + BUG_ON(!cb->wq); > + destroy_workqueue(cb->wq); > + } > + else > + { > + BUG_ON(cb->wq); > + } > +} > + > +EXPORT_SYMBOL(kds_callback_term); > + > +static void kds_do_user_callback(struct kds_resource_set * rset) > +{ > + rset->cb->user_cb(rset->callback_parameter, > rset->callback_extra_parameter); > +} > + > +static void kds_queued_callback(struct work_struct * work) > +{ > + struct kds_resource_set * rset; > + rset = container_of( work, struct kds_resource_set, callback_work); > + > + kds_do_user_callback(rset); > +} > + > +static void kds_callback_perform(struct kds_resource_set * rset) > +{ > + if (rset->cb->direct) > + kds_do_user_callback(rset); > + else > + { > + int result; > + result = queue_work(rset->cb->wq, &rset->callback_work); > + /* if we got a 0 return it means we've triggered the same > rset twice! */ > + BUG_ON(!result); > + } > +} > + > +void kds_resource_init(struct kds_resource * res) > +{ > + BUG_ON(!res); > + INIT_LIST_HEAD(&res->waiters.link); > + res->waiters.parent = KDS_RESOURCE; > +} > +EXPORT_SYMBOL(kds_resource_init); > + > +void kds_resource_term(struct kds_resource * res) > +{ > + BUG_ON(!res); > + BUG_ON(!list_empty(&res->waiters.link)); > + res->waiters.parent = KDS_INVALID; > +} > +EXPORT_SYMBOL(kds_resource_term); > + > +int kds_async_waitall( > + struct kds_resource_set ** pprset, > + unsigned long flags, > + struct kds_callback * cb, > + void * callback_parameter, > + void * callback_extra_parameter, > + int number_resources, > + unsigned long * exclusive_access_bitmap, > + struct kds_resource ** resource_list) > +{ > + struct kds_resource_set * rset = NULL; > + int i; > + int triggered; > + int err = -EFAULT; > + > + BUG_ON(!pprset); > + BUG_ON(!resource_list); > + BUG_ON(!cb); > + > + mutex_lock(&kds_lock); > + > + if ((flags & KDS_FLAG_LOCKED_ACTION) == KDS_FLAG_LOCKED_FAIL) > + { > + for (i = 0; i < number_resources; i++) > + { > + if (resource_list[i]->lock_count) > + { > + err = -EBUSY; > + goto errout; > + } > + } > + } > + > + rset = kmalloc(sizeof(*rset) + number_resources * sizeof(struct > kds_link), GFP_KERNEL); > + if (!rset) > + { > + err = -ENOMEM; > + goto errout; > + } > + > + rset->num_resources = number_resources; > + rset->pending = number_resources; > + rset->locked_resources = 0; > + rset->cb = cb; > + rset->callback_parameter = callback_parameter; > + rset->callback_extra_parameter = callback_extra_parameter; > + INIT_LIST_HEAD(&rset->callback_link); > + INIT_WORK(&rset->callback_work, kds_queued_callback); > + > + for (i = 0; i < number_resources; i++) > + { > + unsigned long link_state = 0; > + > + INIT_LIST_HEAD(&rset->resources[i].link); > + rset->resources[i].parent = rset; > + > + if (test_bit(i, exclusive_access_bitmap)) > + { > + link_state |= KDS_LINK_EXCLUSIVE; > + } > + > + /* no-one else waiting? */ > + if (list_empty(&resource_list[i]->waiters.link)) > + { > + link_state |= KDS_LINK_TRIGGERED; > + rset->pending--; > + } > + /* Adding a non-exclusive and the current tail is a > triggered non-exclusive? */ > + else if (((link_state & KDS_LINK_EXCLUSIVE) == 0) && > + (((list_entry(resource_list[i]->waiters.link.prev, > struct kds_link, link)->state & (KDS_LINK_EXCLUSIVE | KDS_LINK_TRIGGERED)) > == KDS_LINK_TRIGGERED))) > + { > + link_state |= KDS_LINK_TRIGGERED; > + rset->pending--; > + } > + /* locked & ignore locked? */ > + else if ((resource_list[i]->lock_count) && ((flags & > KDS_FLAG_LOCKED_ACTION) == KDS_FLAG_LOCKED_IGNORE) ) > + { > + link_state |= KDS_LINK_TRIGGERED; > + rset->pending--; > + rset->resources[i].parent = KDS_IGNORED; /* to > disable decrementing the pending count when we get the ignored resource */ > + } > + rset->resources[i].state = link_state; > + list_add_tail(&rset->resources[i].link, > &resource_list[i]->waiters.link); > + } > + > + triggered = (rset->pending == 0); > + > + mutex_unlock(&kds_lock); > + > + /* set the pointer before the callback is called so it sees it */ > + *pprset = rset; > + > + if (triggered) > + { > + /* all resources obtained, trigger callback */ > + kds_callback_perform(rset); > + } > + > + return 0; > + > +errout: > + mutex_unlock(&kds_lock); > + return err; > +} > +EXPORT_SYMBOL(kds_async_waitall); > + > +static void wake_up_sync_call(void * callback_parameter, void * > callback_extra_parameter) > +{ > + wait_queue_head_t * wait = (wait_queue_head_t*)callback_parameter; > + wake_up(wait); > +} > + > +static struct kds_callback sync_cb = > +{ > + wake_up_sync_call, > + 1, > + NULL, > +}; > + > +struct kds_resource_set * kds_waitall( > + int number_resources, > + unsigned long * exclusive_access_bitmap, > + struct kds_resource ** resource_list, > + unsigned long jiffies_timeout) > +{ > + struct kds_resource_set * rset; > + int i; > + int triggered = 0; > + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake); > + > + rset = kmalloc(sizeof(*rset) + number_resources * sizeof(struct > kds_link), GFP_KERNEL); > + if (!rset) > + return rset; > + > + rset->num_resources = number_resources; > + rset->pending = number_resources; > + rset->locked_resources = 1; > + INIT_LIST_HEAD(&rset->callback_link); > + INIT_WORK(&rset->callback_work, kds_queued_callback); > + > + mutex_lock(&kds_lock); > + > + for (i = 0; i < number_resources; i++) > + { > + unsigned long link_state = 0; > + > + if (likely(resource_list[i]->lock_count < ULONG_MAX)) > + resource_list[i]->lock_count++; > + else > + break; > + > + if (test_bit(i, exclusive_access_bitmap)) > + { > + link_state |= KDS_LINK_EXCLUSIVE; > + } > + > + if (list_empty(&resource_list[i]->waiters.link)) > + { > + link_state |= KDS_LINK_TRIGGERED; > + rset->pending--; > + } > + /* Adding a non-exclusive and the current tail is a > triggered non-exclusive? */ > + else if (((link_state & KDS_LINK_EXCLUSIVE) == 0) && > + (((list_entry(resource_list[i]->waiters.link.prev, > struct kds_link, link)->state & (KDS_LINK_EXCLUSIVE | KDS_LINK_TRIGGERED)) > == KDS_LINK_TRIGGERED))) > + { > + link_state |= KDS_LINK_TRIGGERED; > + rset->pending--; > + } > + > + INIT_LIST_HEAD(&rset->resources[i].link); > + rset->resources[i].parent = rset; > + rset->resources[i].state = link_state; > + list_add_tail(&rset->resources[i].link, > &resource_list[i]->waiters.link); > + } > + > + if (i < number_resources) > + { > + /* an overflow was detected, roll back */ > + while (i--) > + { > + list_del(&rset->resources[i].link); > + resource_list[i]->lock_count--; > + } > + mutex_unlock(&kds_lock); > + kfree(rset); > + return ERR_PTR(-EFAULT); > + } > + > + if (rset->pending == 0) > + triggered = 1; > + else > + { > + rset->cb = &sync_cb; > + rset->callback_parameter = &wake; > + rset->callback_extra_parameter = NULL; > + } > + > + mutex_unlock(&kds_lock); > + > + if (!triggered) > + { > + long wait_res; > + if ( KDS_WAIT_BLOCKING == jiffies_timeout ) > + { > + wait_res = wait_event_interruptible(wake, > rset->pending == 0); > + } > + else > + { > + wait_res = wait_event_interruptible_timeout(wake, > rset->pending == 0, jiffies_timeout); > + } > + if ((wait_res == -ERESTARTSYS) || (wait_res == 0)) > + { > + /* use \a kds_resource_set_release to roll back */ > + kds_resource_set_release(&rset); > + return ERR_PTR(wait_res); > + } > + } > + return rset; > +} > +EXPORT_SYMBOL(kds_waitall); > + > +void kds_resource_set_release(struct kds_resource_set ** pprset) > +{ > + struct list_head triggered = LIST_HEAD_INIT(triggered); > + struct kds_resource_set * rset; > + struct kds_resource_set * it; > + int i; > + > + BUG_ON(!pprset); > + > + mutex_lock(&kds_lock); > + > + rset = *pprset; > + if (!rset) > + { > + /* caught a race between a cancelation > + * and a completion, nothing to do */ > + mutex_unlock(&kds_lock); > + return; > + } > + > + /* clear user pointer so we'll be the only > + * thread handling the release */ > + *pprset = NULL; > + > + for (i = 0; i < rset->num_resources; i++) > + { > + struct kds_resource * resource; > + struct kds_link * it = NULL; > + > + /* fetch the previous entry on the linked list */ > + it = list_entry(rset->resources[i].link.prev, struct > kds_link, link); > + /* unlink ourself */ > + list_del(&rset->resources[i].link); > + > + /* any waiters? */ > + if (list_empty(&it->link)) > + continue; > + > + /* were we the head of the list? (head if prev is a > resource) */ > + if (it->parent != KDS_RESOURCE) > + continue; > + > + /* we were the head, find the kds_resource */ > + resource = container_of(it, struct kds_resource, waiters); > + > + if (rset->locked_resources) > + { > + resource->lock_count--; > + } > + > + /* we know there is someone waiting from the any-waiters > test above */ > + > + /* find the head of the waiting list */ > + it = list_first_entry(&resource->waiters.link, struct > kds_link, link); > + > + /* new exclusive owner? */ > + if (it->state & KDS_LINK_EXCLUSIVE) > + { > + /* link now triggered */ > + it->state |= KDS_LINK_TRIGGERED; > + /* a parent to update? */ > + if (it->parent != KDS_IGNORED) > + { > + if (0 == --it->parent->pending) > + { > + /* new owner now triggered, track > for callback later */ > + list_add(&it->parent->callback_link, > &triggered); > + } > + } > + } > + /* exclusive releasing ? */ > + else if (rset->resources[i].state & KDS_LINK_EXCLUSIVE) > + { > + /* trigger non-exclusive until end-of-list or first > exclusive */ > + list_for_each_entry(it, &resource->waiters.link, > link) > + { > + /* exclusive found, stop triggering */ > + if (it->state & KDS_LINK_EXCLUSIVE) > + break; > + > + it->state |= KDS_LINK_TRIGGERED; > + /* a parent to update? */ > + if (it->parent != KDS_IGNORED) > + { > + if (0 == --it->parent->pending) > + { > + /* new owner now triggered, > track for callback later */ > + > list_add(&it->parent->callback_link, &triggered); > + } > + } > + } > + } > + > + } > + > + mutex_unlock(&kds_lock); > + > + while (!list_empty(&triggered)) > + { > + it = list_first_entry(&triggered, struct kds_resource_set, > callback_link); > + list_del(&it->callback_link); > + kds_callback_perform(it); > + } > + > + cancel_work_sync(&rset->callback_work); > + > + /* free the resource set */ > + kfree(rset); > +} > +EXPORT_SYMBOL(kds_resource_set_release); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("ARM Ltd."); > +MODULE_VERSION("1.0"); > diff --git a/include/linux/kds.h b/include/linux/kds.h > new file mode 100644 > index 0000000..65e5706 > --- /dev/null > +++ b/include/linux/kds.h > @@ -0,0 +1,154 @@ > +/* > + * > + * (C) COPYRIGHT 2012 ARM Limited. All rights reserved. > + * > + * This program is free software and is provided to you under the terms of > the GNU General Public License version 2 > + * as published by the Free Software Foundation, and any use by you of this > program is subject to the terms of such GNU licence. > + * > + * A copy of the licence is included with the program, and can also be > obtained from Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > 02110-1301, USA. > + * > + */ > + > + > + > +#ifndef _KDS_H_ > +#define _KDS_H_ > + > +#include <linux/list.h> > +#include <linux/workqueue.h> > + > +#define KDS_WAIT_BLOCKING (ULONG_MAX) > + > +/* what to do when waitall must wait for a synchronous locked resource: */ > +#define KDS_FLAG_LOCKED_FAIL (0u << 0) /* fail waitall */ > +#define KDS_FLAG_LOCKED_IGNORE (1u << 0) /* don't wait, but block other > that waits */ > +#define KDS_FLAG_LOCKED_WAIT (2u << 0) /* wait (normal */ > +#define KDS_FLAG_LOCKED_ACTION (3u << 0) /* mask to extract the action to > do on locked resources */ > + > +struct kds_resource_set; > + > +typedef void (*kds_callback_fn) (void * callback_parameter, void * > callback_extra_parameter); > + > +struct kds_callback > +{ > + kds_callback_fn user_cb; /* real cb */ > + int direct; /* do direct or queued call? */ > + struct workqueue_struct * wq; > +}; > + > +struct kds_link > +{ > + struct kds_resource_set * parent; > + struct list_head link; > + unsigned long state; > +}; > + > +struct kds_resource > +{ > + struct kds_link waiters; > + unsigned long lock_count; > +}; > + > +/* callback API */ > + > +/* Initialize a callback object. > + * > + * Typically created per context or per hw resource. > + * > + * Callbacks can be performed directly if no nested locking can > + * happen in the client. > + * > + * Nested locking can occur when a lock is held during the > kds_async_waitall or > + * kds_resource_set_release call. If the callback needs to take the same > lock > + * nested locking will happen. > + * > + * If nested locking could happen non-direct callbacks can be requested. > + * Callbacks will then be called asynchronous to the triggering call. > + */ > +int kds_callback_init(struct kds_callback * cb, int direct, kds_callback_fn > user_cb); > + > +/* Terminate the use of a callback object. > + * > + * If the callback object was set up as non-direct > + * any pending callbacks will be flushed first. > + * Note that to avoid a deadlock the lock callbacks needs > + * can't be held when a callback object is terminated. > + */ > +void kds_callback_term(struct kds_callback * cb); hmm, not hugely a fan of this.. having callbacks that might need to aquire locks be potentially called synchronously in special cases. It seems like it can get simpler if pending callbacks could hold a reference to the underlying object until the callback completes. Although not quite sure offhand how that can work without coupling kds to dmabuf or GEM.. > + > +/* resource object API */ > + > +/* initialize a resource handle for a shared resource */ > +void kds_resource_init(struct kds_resource * resource); > + > +/* > + * Will assert if the resource is being used or waited on. > + * The caller should NOT try to terminate a resource that could still have > clients. > + * After the function returns the resource is no longer known by kds. > + */ > +void kds_resource_term(struct kds_resource * resource); > + > +/* Asynchronous wait for a set of resources. > + * Callback will be called when all resources are available. > + * If all the resources was available the callback will be called before > kds_async_waitall returns. > + * So one must not hold any locks the callback code-flow can take when > calling kds_async_waitall. > + * Caller considered to own/use the resources until \a kds_rset_release is > called. > + * flags is one or more of the KDS_FLAG_* set. > + * exclusive_access_bitmap is a bitmap where a high bit means exclusive > access while a low bit means shared access. > + * Use the Linux __set_bit API, where the index of the buffer to control is > used as the bit index. > + * > + * Standard Linux error return value. > + */ > +int kds_async_waitall( > + struct kds_resource_set ** pprset, > + unsigned long flags, > + struct kds_callback * cb, > + void * callback_parameter, > + void * callback_extra_parameter, > + int number_resources, > + unsigned long * exclusive_access_bitmap, hmm, is there advantage of passing the requested resources this way, vs. just having two arrays (one for exclusive access, one for shared access)? > + struct kds_resource ** resource_list); callback_parameter + callback_extra_parameter seems a bit, well, odd. I'm guessing that is some implementation detail about mali driver peaking through? But maybe instead of inventing something new, we can just use 'struct kthread_work' instead of 'struct kds_callback' plus the two 'void *'s? If the user needs some extra args they can embed 'struct kthread_work' in their own struct and use container_of() magic in the cb. Plus this is a natural fit if you want to dispatch callbacks instead on a kthread_worker, which seems like it would simplify a few things when it comes to deadlock avoidance.. ie., not resource deadlock avoidance, but dispatching callbacks when some lock is held. /me wonders what sort of fun would otherwise happen if the cb ever indirectly called kds_resource_set_release()? > +/* Synchronous wait for a set of resources. > + * Function will return when one of these have happened: > + * - all resources have been obtained > + * - timeout lapsed while waiting > + * - a signal was received while waiting > + * > + * Caller considered to own/use the resources when the function returns. > + * Caller must release the resources using \a kds_rset_release. > + * > + * Calling this function while holding already locked resources or other > locking primitives is dangerous. > + * One must if this is needed decide on a lock order of the resources > and/or the other locking primitives > + * and always take the resources/locking primitives in the specific order. > + * > + * Use the ERR_PTR framework to decode the return value. > + * NULL = time out > + * If IS_ERR then PTR_ERR gives: > + * ERESTARTSYS = signal received, retry call after signal > + * all other values = internal error, lock failed > + * Other values = successful wait, now the owner, must call > kds_resource_set_release > + */ > +struct kds_resource_set * kds_waitall( > + int number_resources, > + unsigned long * exclusive_access_bitmap, > + struct kds_resource ** resource_list, > + unsigned long jifies_timeout); > + > +/* Release resources after use. > + * Caller must handle that other async callbacks will trigger, > + * so must avoid holding any locks a callback will take. > + * > + * The function takes a pointer to your poiner to handle a race > + * between a cancelation and a completion. > + * > + * If the caller can't guarantee that a race can't occur then > + * the passed in pointer must be the same in both call paths > + * to allow kds to manage the potential race. > + */ > +void kds_resource_set_release(struct kds_resource_set ** pprset); maybe using a worker as mentioned above for dealing w/ async cb's simplify's things? Maybe I'm a bit paranoid about locking around all of this, but basically the synchronization framework ends up having to deal w/ all the potential deadlock and recursive lock issues that we tried to hide from w/ dmabuf ;-) BR, -R > + > +#endif /* _KDS_H_ */ > + > > > > _______________________________________________ > dri-devel mailing list > dri-devel(a)lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel

13 years, 6 months

2
1
0 0

[PATCH] ARM: mm: fix type of the arm_dma_limit global variable

by Marek Szyprowski

arm_dma_limit stores physical address of maximal address accessible by DMA, so the phys_addr_t type makes much more sence for it instead of u32. This patch fixes the following build warning: arch/arm/mm/init.c:380: warning: comparison of distinct pointer types lacks a cast Reported-by: Russell King <linux(a)arm.linux.org.uk> Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com> --- arch/arm/mm/init.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 8f5813b..39f2a86 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -211,7 +211,7 @@ EXPORT_SYMBOL(arm_dma_zone_size); * allocations. This must be the smallest DMA mask in the system, * so a successful GFP_DMA allocation will always satisfy this. */ -u32 arm_dma_limit; +phys_addr_t arm_dma_limit; static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole, unsigned long dma_size) -- 1.7.1.569.g6f426

13 years, 6 months

2
2
0 0

(no subject)

by Erik Gilling

>From c976a6bf6b144f321a7a84ca40e27a2373174e8a Mon Sep 17 00:00:00 2001 From: Erik Gilling <konkers(a)android.com> Date: Tue, 5 Jun 2012 14:39:58 -0700 Subject: [RFC 00/11] Synchronization framework Here is the synchronization framework we are work on here at Android. We chose to keep synchronization decoupled from dma_buf because we need to synchronize with resouces that do not implement dma_buf. Erik Gilling (11): sync: Add synchronization framework sw_sync: add cpu based sync driver sync: add timestamps to sync_pts sync: add debugfs support sw_sync: add debug support sync: add ioctl to get fence data sw_sync: add fill_driver_data support sync: add poll support sync: allow async waits to be canceled sync: export sync API symbols sw_sync: export sw_sync API Documentation/sync.txt | 75 +++++ drivers/base/Kconfig | 26 ++ drivers/base/Makefile | 3 + drivers/base/sw_sync.c | 259 +++++++++++++++ drivers/base/sync.c | 830 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/sw_sync.h | 58 ++++ include/linux/sync.h | 414 +++++++++++++++++++++++ 7 files changed, 1665 insertions(+), 0 deletions(-) create mode 100644 Documentation/sync.txt create mode 100644 drivers/base/sw_sync.c create mode 100644 drivers/base/sync.c create mode 100644 include/linux/sw_sync.h create mode 100644 include/linux/sync.h -- 1.7.7.3

13 years, 6 months

1
12
0 0

[GIT PULL] CMA and ARM DMA-mapping fix for v3.5-rc2

by Marek Szyprowski

Hi Linus, I would like to ask for pulling a fix for Contiguous Memory Allocator (CMA) integration for ARM architecture for v3.5-rc2. The following changes since commit f8f5701bdaf9134b1f90e5044a82c66324d2073f: Linux 3.5-rc1 (2012-06-02 18:29:26 -0700) are available in the git repository at: git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git fixes-for-linus Marek Szyprowski (1): ARM: dma-mapping: remove unconditional dependency on CMA arch/arm/Kconfig | 1 - arch/arm/mm/dma-mapping.c | 10 ++++------ 2 files changed, 4 insertions(+), 7 deletions(-) Thanks! Best regards Marek Szyprowski Samsung Poland R&D Center

13 years, 6 months

1
0
0 0

[GIT PULL] CMA and ARM DMA-mapping updates for v3.5

by Marek Szyprowski

Hi Linus, I would like to ask for pulling Contiguous Memory Allocator (CMA) and ARM DMA-mapping framework updates for v3.5. The following changes since commit 76e10d158efb6d4516018846f60c2ab5501900bc: Linux 3.4 (2012-05-20 15:29:13 -0700) with the top-most commit 0f51596bd39a5c928307ffcffc9ba07f90f42a8b Merge branch 'for-next-arm-dma' into for-linus are available in the git repository at: git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git for-linus These patches contains 2 major updates for DMA mapping subsystem (mainly for ARM architecture). First one is Contiguous Memory Allocator (CMA) which makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted. The main difference from the similar frameworks is the fact that CMA allows to transparently reuse memory region reserved for the big chunk allocation as a system memory, so no memory is wasted when no big chunk is allocated. Once the alloc request is issued, the framework migrates system pages to create a space for the required big chunk of physically contiguous memory. For more information one can refer to nice LWN articles: 'A reworked contiguous memory allocator': http://lwn.net/Articles/447405/ 'CMA and ARM': http://lwn.net/Articles/450286/ 'A deep dive into CMA': http://lwn.net/Articles/486301/ and the following thread with the patches and links to all previous versions: https://lkml.org/lkml/2012/4/3/204 The main client for this new framework is ARM DMA-mapping subsystem. The second part provides a complete redesign in ARM DMA-mapping subsystem. The core implementation has been changed to use common struct dma_map_ops based infrastructure with the recent updates for new dma attributes merged in v3.4-rc2. This allows to use more than one implementation of dma-mapping calls and change/select them on the struct device basis. The first client of this new infractructure is dmabounce implementation which has been completely cut out of the core, common code. The last patch of this redesign update introduces a new, experimental implementation of dma-mapping calls on top of generic IOMMU framework. This lets ARM sub-platform to transparently use IOMMU for DMA-mapping calls if one provides required IOMMU hardware. For more information please refer to the following thread: http://www.spinics.net/lists/arm-kernel/msg175729.html The last patch merges changes from both updates and provides a resolution for the conflicts which cannot be avoided when patches have been applied on the same files (mainly arch/arm/mm/dma-mapping.c). Thanks! Best regards Marek Szyprowski Samsung Poland R&D Center Patch summary: Marek Szyprowski (17): common: add dma_mmap_from_coherent() function ARM: dma-mapping: use dma_mmap_from_coherent() ARM: dma-mapping: use pr_* instread of printk ARM: dma-mapping: introduce DMA_ERROR_CODE constant ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops ARM: dma-mapping: use asm-generic/dma-mapping-common.h ARM: dma-mapping: implement dma sg methods on top of any generic dma ops ARM: dma-mapping: move all dma bounce code to separate dma ops structure ARM: dma-mapping: remove redundant code and do the cleanup ARM: dma-mapping: use alloc, mmap, free from dma_ops ARM: dma-mapping: add support for IOMMU mapper mm: extract reclaim code from __alloc_pages_direct_reclaim() mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks drivers: add Contiguous Memory Allocator X86: integrate CMA with DMA-mapping subsystem ARM: integrate CMA with DMA-mapping subsystem Merge branch 'for-next-arm-dma' into for-linus Mel Gorman (1): mm: Serialize access to min_free_kbytes Michal Nazarewicz (9): mm: page_alloc: remove trailing whitespace mm: compaction: introduce isolate_migratepages_range() mm: compaction: introduce map_pages() mm: compaction: introduce isolate_freepages_range() mm: compaction: export some of the functions mm: page_alloc: introduce alloc_contig_range() mm: page_alloc: change fallbacks array handling mm: mmzone: MIGRATE_CMA migration type added mm: page_isolation: MIGRATE_CMA isolation functions added Minchan Kim (1): cma: fix migration mode Vitaly Andrianov (1): ARM: dma-mapping: use PMD size for section unmap Documentation/kernel-parameters.txt | 9 + arch/Kconfig | 3 + arch/arm/Kconfig | 11 + arch/arm/common/dmabounce.c | 84 ++- arch/arm/include/asm/device.h | 4 + arch/arm/include/asm/dma-contiguous.h | 15 + arch/arm/include/asm/dma-iommu.h | 34 + arch/arm/include/asm/dma-mapping.h | 407 +++-------- arch/arm/include/asm/mach/map.h | 1 + arch/arm/kernel/setup.c | 9 +- arch/arm/mm/dma-mapping.c | 1348 ++++++++++++++++++++++++++++----- arch/arm/mm/init.c | 23 +- arch/arm/mm/mm.h | 3 + arch/arm/mm/mmu.c | 31 +- arch/arm/mm/vmregion.h | 2 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/dma-contiguous.h | 13 + arch/x86/include/asm/dma-mapping.h | 5 + arch/x86/kernel/pci-dma.c | 18 +- arch/x86/kernel/pci-nommu.c | 8 +- arch/x86/kernel/setup.c | 2 + drivers/base/Kconfig | 89 +++ drivers/base/Makefile | 1 + drivers/base/dma-coherent.c | 42 + drivers/base/dma-contiguous.c | 401 ++++++++++ include/asm-generic/dma-coherent.h | 4 +- include/asm-generic/dma-contiguous.h | 28 + include/linux/device.h | 4 + include/linux/dma-contiguous.h | 110 +++ include/linux/gfp.h | 12 + include/linux/mmzone.h | 47 +- include/linux/page-isolation.h | 18 +- mm/Kconfig | 2 +- mm/Makefile | 3 +- mm/compaction.c | 418 +++++++---- mm/internal.h | 33 + mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 6 +- mm/page_alloc.c | 409 +++++++++-- mm/page_isolation.c | 15 +- mm/vmstat.c | 3 + 41 files changed, 2898 insertions(+), 780 deletions(-) create mode 100644 arch/arm/include/asm/dma-contiguous.h create mode 100644 arch/arm/include/asm/dma-iommu.h create mode 100644 arch/x86/include/asm/dma-contiguous.h create mode 100644 drivers/base/dma-contiguous.c create mode 100644 include/asm-generic/dma-contiguous.h create mode 100644 include/linux/dma-contiguous.h

13 years, 6 months

3
5
0 0

[PATCHv2 0/4] ARM: replace custom consistent dma region with vmalloc

by Marek Szyprowski

Hello! Recent changes to ioremap and unification of vmalloc regions on ARM significantly reduces the possible size of the consistent dma region. They are significantly limited allowed dma coherent/writecombine allocations. This experimental patchset replaces custom consistent dma regions usage in dma-mapping framework in favour of generic vmalloc areas created on demand for each coherent and writecombine allocations. The main purpose for this patchset is to remove 2MiB limit of dma coherent/writecombine allocations. Atomic allocations are served from special pool preallocated on boot, becasue vmalloc areas cannot be reliably created in atomic context. This patch is based on vanilla v3.4-rc7 release. Atomic allocations have been tested with s3c-sdhci driver on Samsung UniversalC210 board with dmabounce code enabled to force dma_alloc_coherent() use on each dma_map_* call (some of them are made from interrupts). Best regards Marek Szyprowski Samsung Poland R&D Center Changelog: v2: - added support for atomic allocations (served from preallocated pool) - minor cleanup here and there - rebased onto v3.4-rc7 v1: http://thread.gmane.org/gmane.linux.kernel.mm/76703 - initial version Patch summary: Marek Szyprowski (4): mm: vmalloc: use const void * for caller argument mm: vmalloc: export find_vm_area() function mm: vmalloc: add VM_DMA flag to indicate areas used by dma-mapping framework ARM: dma-mapping: remove custom consistent dma region Documentation/kernel-parameters.txt | 4 + arch/arm/include/asm/dma-mapping.h | 2 +- arch/arm/mm/dma-mapping.c | 360 ++++++++++++++++------------------- include/linux/vmalloc.h | 10 +- mm/vmalloc.c | 31 ++-- 5 files changed, 185 insertions(+), 196 deletions(-) -- 1.7.10.1

13 years, 6 months

5
17
0 0

[GIT PULL]: dma-buf updates for 3.5

by Sumit Semwal

Hi Linus, Here's the first signed-tag pull request for dma-buf framework. Could you please pull the dma-buf updates for 3.5? This includes the following key items: - mmap support - vmap support - related documentation updates These are needed by various drivers to allow mmap/vmap of dma-buf shared buffers. Dave Airlie has some prime patches dependent on the vmap pull as well. Thanks and best regards, ~Sumit. The following changes since commit 76e10d158efb6d4516018846f60c2ab5501900bc: Linux 3.4 (2012-05-20 15:29:13 -0700) are available in the git repository at: ssh://sumitsemwal@git.linaro.org/~/public_git/linux-dma-buf.git tags/tag-for-linus-3.5 for you to fetch changes up to b25b086d23eb852bf3cfdeb60409b4967ebb3c0c: dma-buf: add initial vmap documentation (2012-05-25 12:51:11 +0530) ---------------------------------------------------------------- dma-buf updates for 3.5 ---------------------------------------------------------------- Daniel Vetter (1): dma-buf: mmap support Dave Airlie (2): dma-buf: add vmap interface dma-buf: add initial vmap documentation Sumit Semwal (1): dma-buf: minor documentation fixes. Documentation/dma-buf-sharing.txt | 109 ++++++++++++++++++++++++++++++++++--- drivers/base/dma-buf.c | 99 ++++++++++++++++++++++++++++++++- include/linux/dma-buf.h | 33 +++++++++++ 3 files changed, 233 insertions(+), 8 deletions(-)

13 years, 6 months

3
4
0 0

[RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

by Tom Cooksey

Hi All, I realise it's been a while since this was last discussed, however I'd like to bring up kernel-side synchronization again. By kernel-side synchronization, I mean allowing multiple drivers/devices wanting to access the same buffer to do so without bouncing up to userspace to resolve dependencies such as "the display controller can't start scanning out a buffer until the GPU has finished rendering into it". As such, this is really just an optimization which reduces latency between E.g. The GPU finishing a rendering job and that buffer being scanned out. I appreciate this particular example is already solved on desktop graphics cards as the display controller and 3D core are both controlled by the same driver, so no "generic" mechanism is needed. However on ARM SoCs, the 3D core (like an ARM Mali) and display controller tend to be driven by separate drivers, so some mechanism is needed to allow both drivers to synchronize their access to buffers. There are multiple ways synchronization can be achieved, fences/sync objects is one common approach, however we're presenting a different approach. Personally, I quite like fence sync objects, however we believe it requires a lot of userspace interfaces to be changed to pass around sync object handles. Our hope is that the kds approach will require less effort to make use of as no existing userspace interfaces need to be changed. E.g. To use explicit fences, the struct drm_mode_crtc_page_flip would need a new members to pass in the handle(s) of sync object(s) which the flip depends on (I.e. don't flip until these fences fire). The additional benefit of our approach is that it prevents userspace specifying dependency loops which can cause a deadlock (see kds.txt for an explanation of what I mean here). I have waited until now to bring this up again because I am now able to share the code I was trying (and failing I think) to explain previously. The code has now been released under the GPLv2 from ARM Mali's developer portal, however I've attempted to turn that into a patch to allow it to be discussed on this list. Please find the patch inline below. While KDS defines a very generic mechanism, I am proposing that this code or at least the concepts be merged with the existing dma_buf code, so a the struct kds_resource members get moved to struct dma_buf, kds_* functions get renamed to dma_buf_* functions, etc. So I guess what I'm saying is please don't review the actual code just yet, only the concepts the code describes, where kds_resource == dma_duf. Cheers, Tom Author: Tom Cooksey <tom.cooksey(a)arm.com> Date: Fri May 25 10:45:27 2012 +0100 Add new system to allow synchronizing access to resources See Documentation/kds.txt for details, however the general idea is that this kds framework synchronizes multiple drivers ("clients") wanting to access the same resources, where a resource is typically a 2D image buffer being shared around using dma-buf. Note: This patch is created by extracting the sources from the tarball on <http://www.malideveloper.com/open-source-mali-gpus-lin ux-kernel-device-drivers---dev-releases.php> and putting them in roughly the right places. diff --git a/Documentation/kds.txt b/Documentation/kds.txt new file mode 100644 index 0000000..a96db21 --- /dev/null +++ b/Documentation/kds.txt @@ -0,0 +1,113 @@ +# +# (C) COPYRIGHT 2012 ARM Limited. All rights reserved. +# +# This program is free software and is provided to you under the terms of the GNU General Public License version 2 +# as published by the Free Software Foundation, and any use by you of this program is subject to the terms of such GNU licence. +# +# A copy of the licence is included with the program, and can also be obtained from Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. +# +# + + +============================== +kds - Kernel Dependency System +============================== + +Introduction +------------ +kds provides a mechanism for clients to atomically lock down multiple abstract resources. +This can be done either synchronously or asynchronously. +Abstract resources is used to allow a set of clients to use kds to control access to any +resource, an example is structured memory buffers. + +kds supports that buffer is locked for exclusive access and sharing of buffers. + +kds can be built as either a integrated feature of the kernel or as a module. +It supports being compiled as a module both in-tree and out-of-tree. + + +Concepts +-------- +A core concept in kds is abstract resources. +A kds resource is just an abstraction for some client object, kds doesn't care what it is. +Typically EGL will consider UMP buffers as being a resource, thus each UMP buffer has +a kds resource for synchronization to the buffer. + +kds allows a client to create and destroy the abstract resource objects. +A new resource object is made available asap (it is just a simple malloc with some initializations), +while destroy it requires some external synchronization. + +The other core concept in kds is consumer of resources. +kds is requested to allow a client to consume a set of resources and the client will be notified when it can consume the resources. + +Exclusive access allows only one client to consume a resource. +Shared access permits multiple consumers to acceess a resource concurrently. + + +APIs +---- +kds provides simple resource allocate and destroy functions. +Clients use this to instantiate and control the lifetime of the resources kds manages. + +kds provides two ways to wait for resources: +- Asynchronous wait: the client specifies a function pointer to be called when wait is over +- Synchronous wait: Function blocks until access is gained. + +The synchronous API has a timeout for the wait. +The call can early out if a signal is delivered. + +After a client is done consuming the resource kds must be notified to release the resources and let some other client take ownership. +This is done via resource set release call. + +A Windows comparison: +kds implements WaitForMultipleObjectsEx(..., bWaitAll = TRUE, ...) but also has an asynchronous version in addition. +kds resources can be seen as being the same as NT object manager resources. + +Internals +--------- +kds guarantees atomicity when a set of resources is operated on. +This is implemented via a global resource lock which is taken by kds when it updates resource objects. + +Internally a resource in kds is a linked list head with some flags. + +When a consumer requests access to a set of resources it is queued on each of the resources. +The link from the consumer to the resources can be triggered. Once all links are triggered +the registered callback is called or the blocking function returns. +A link is considered triggered if it is the first on the list of consumers of a resource, +or if all the links ahead of it is marked as shared and itself is of the type shared. + +When the client is done consuming the consumer object is removed from the linked lists of +the resources and a potential new consumer becomes the head of the resources. +As we add and remove consumers atomically across all resources we can guarantee that +we never introduces a A->B + B->A type of loops/deadlocks. + + +kbase/base implementation +------------------------- +A HW job needs access to a set of shared resources. +EGL tracks this and encodes the set along with the atom in the ringbuffer. +EGL allocates a (k)base dep object to represent the dependency to the set of resources and encodes that along with the list of resources. +This dep object is use to create a dependency from a job chain(atom) to the resources it needs to run. +When kbase decodes the atom in the ringbuffer it finds the set of resources and calls kds to request all the needed resources. +As EGL needs to know when the kds request is delivered a new base event object is needed: atom enqueued. This event is only delivered for atoms which uses kds. +The callback kbase registers trigger the dependency object described which would trigger the existing JD system to release the job chain. +When the atom is done kds resource set release is call to release the resources. + +EGL will typically use exclusive access to the render target, while all buffers used as input can be marked as shared. + + +Buffer publish/vsync +-------------------- +EGL will use a separate ioctl or DRM flip to request the flip. +If the LCD driver is integrated with kds EGL can do these operations early. +The LCD driver must then implement the ioctl or DRM flip to be asynchronous with kds async call. +The LCD driver binds a kds resource to each virtual buffer (2 buffers in case of double-buffering). +EGL will make a dependency to the target kds resource in the kbase atom. +After EGL receives a atom enqueued event it can ask the LCD driver to pan to the target kds resource. +When the atom is completed it'll release the resource and the LCD driver will get its callback. +In the callback it'll load the target buffer into the DMA unit of the LCD hardware. +The LCD driver will be the consumer of both buffers for a short period. +The LCD driver will call kds resource set release on the previous on-screen buffer when the next vsync/dma read end is handled. + + diff --git a/drivers/misc/kds.c b/drivers/misc/kds.c new file mode 100644 index 0000000..8d7d55e --- /dev/null +++ b/drivers/misc/kds.c @@ -0,0 +1,461 @@ +/* + * + * (C) COPYRIGHT 2012 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation, and any use by you of this program is subject to the terms of such GNU licence. + * + * A copy of the licence is included with the program, and can also be obtained from Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + */ + + + +#include <linux/slab.h> +#include <linux/list.h> +#include <linux/mutex.h> +#include <linux/wait.h> +#include <linux/sched.h> +#include <linux/err.h> +#include <linux/module.h> +#include <linux/workqueue.h> +#include <linux/kds.h> + + +#define KDS_LINK_TRIGGERED (1u << 0) +#define KDS_LINK_EXCLUSIVE (1u << 1) + +#define KDS_IGNORED NULL +#define KDS_INVALID (void*)-2 +#define KDS_RESOURCE (void*)-1 + +struct kds_resource_set +{ + unsigned long num_resources; + unsigned long pending; + unsigned long locked_resources; + struct kds_callback * cb; + void * callback_parameter; + void * callback_extra_parameter; + struct list_head callback_link; + struct work_struct callback_work; + struct kds_link resources[0]; +}; + +static DEFINE_MUTEX(kds_lock); + +int kds_callback_init(struct kds_callback * cb, int direct, kds_callback_fn user_cb) +{ + int ret = 0; + + cb->direct = direct; + cb->user_cb = user_cb; + + if (!direct) + { + cb->wq = alloc_workqueue("kds", WQ_UNBOUND | WQ_HIGHPRI, WQ_UNBOUND_MAX_ACTIVE); + if (!cb->wq) + ret = -ENOMEM; + } + else + { + cb->wq = NULL; + } + + return ret; +} +EXPORT_SYMBOL(kds_callback_init); + +void kds_callback_term(struct kds_callback * cb) +{ + if (!cb->direct) + { + BUG_ON(!cb->wq); + destroy_workqueue(cb->wq); + } + else + { + BUG_ON(cb->wq); + } +} + +EXPORT_SYMBOL(kds_callback_term); + +static void kds_do_user_callback(struct kds_resource_set * rset) +{ + rset->cb->user_cb(rset->callback_parameter, rset->callback_extra_parameter); +} + +static void kds_queued_callback(struct work_struct * work) +{ + struct kds_resource_set * rset; + rset = container_of( work, struct kds_resource_set, callback_work); + + kds_do_user_callback(rset); +} + +static void kds_callback_perform(struct kds_resource_set * rset) +{ + if (rset->cb->direct) + kds_do_user_callback(rset); + else + { + int result; + result = queue_work(rset->cb->wq, &rset->callback_work); + /* if we got a 0 return it means we've triggered the same rset twice! */ + BUG_ON(!result); + } +} + +void kds_resource_init(struct kds_resource * res) +{ + BUG_ON(!res); + INIT_LIST_HEAD(&res->waiters.link); + res->waiters.parent = KDS_RESOURCE; +} +EXPORT_SYMBOL(kds_resource_init); + +void kds_resource_term(struct kds_resource * res) +{ + BUG_ON(!res); + BUG_ON(!list_empty(&res->waiters.link)); + res->waiters.parent = KDS_INVALID; +} +EXPORT_SYMBOL(kds_resource_term); + +int kds_async_waitall( + struct kds_resource_set ** pprset, + unsigned long flags, + struct kds_callback * cb, + void * callback_parameter, + void * callback_extra_parameter, + int number_resources, + unsigned long * exclusive_access_bitmap, + struct kds_resource ** resource_list) +{ + struct kds_resource_set * rset = NULL; + int i; + int triggered; + int err = -EFAULT; + + BUG_ON(!pprset); + BUG_ON(!resource_list); + BUG_ON(!cb); + + mutex_lock(&kds_lock); + + if ((flags & KDS_FLAG_LOCKED_ACTION) == KDS_FLAG_LOCKED_FAIL) + { + for (i = 0; i < number_resources; i++) + { + if (resource_list[i]->lock_count) + { + err = -EBUSY; + goto errout; + } + } + } + + rset = kmalloc(sizeof(*rset) + number_resources * sizeof(struct kds_link), GFP_KERNEL); + if (!rset) + { + err = -ENOMEM; + goto errout; + } + + rset->num_resources = number_resources; + rset->pending = number_resources; + rset->locked_resources = 0; + rset->cb = cb; + rset->callback_parameter = callback_parameter; + rset->callback_extra_parameter = callback_extra_parameter; + INIT_LIST_HEAD(&rset->callback_link); + INIT_WORK(&rset->callback_work, kds_queued_callback); + + for (i = 0; i < number_resources; i++) + { + unsigned long link_state = 0; + + INIT_LIST_HEAD(&rset->resources[i].link); + rset->resources[i].parent = rset; + + if (test_bit(i, exclusive_access_bitmap)) + { + link_state |= KDS_LINK_EXCLUSIVE; + } + + /* no-one else waiting? */ + if (list_empty(&resource_list[i]->waiters.link)) + { + link_state |= KDS_LINK_TRIGGERED; + rset->pending--; + } + /* Adding a non-exclusive and the current tail is a triggered non-exclusive? */ + else if (((link_state & KDS_LINK_EXCLUSIVE) == 0) && + (((list_entry(resource_list[i]->waiters.link.prev, struct kds_link, link)->state & (KDS_LINK_EXCLUSIVE | KDS_LINK_TRIGGERED)) == KDS_LINK_TRIGGERED))) + { + link_state |= KDS_LINK_TRIGGERED; + rset->pending--; + } + /* locked & ignore locked? */ + else if ((resource_list[i]->lock_count) && ((flags & KDS_FLAG_LOCKED_ACTION) == KDS_FLAG_LOCKED_IGNORE) ) + { + link_state |= KDS_LINK_TRIGGERED; + rset->pending--; + rset->resources[i].parent = KDS_IGNORED; /* to disable decrementing the pending count when we get the ignored resource */ + } + rset->resources[i].state = link_state; + list_add_tail(&rset->resources[i].link, &resource_list[i]->waiters.link); + } + + triggered = (rset->pending == 0); + + mutex_unlock(&kds_lock); + + /* set the pointer before the callback is called so it sees it */ + *pprset = rset; + + if (triggered) + { + /* all resources obtained, trigger callback */ + kds_callback_perform(rset); + } + + return 0; + +errout: + mutex_unlock(&kds_lock); + return err; +} +EXPORT_SYMBOL(kds_async_waitall); + +static void wake_up_sync_call(void * callback_parameter, void * callback_extra_parameter) +{ + wait_queue_head_t * wait = (wait_queue_head_t*)callback_parameter; + wake_up(wait); +} + +static struct kds_callback sync_cb = +{ + wake_up_sync_call, + 1, + NULL, +}; + +struct kds_resource_set * kds_waitall( + int number_resources, + unsigned long * exclusive_access_bitmap, + struct kds_resource ** resource_list, + unsigned long jiffies_timeout) +{ + struct kds_resource_set * rset; + int i; + int triggered = 0; + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake); + + rset = kmalloc(sizeof(*rset) + number_resources * sizeof(struct kds_link), GFP_KERNEL); + if (!rset) + return rset; + + rset->num_resources = number_resources; + rset->pending = number_resources; + rset->locked_resources = 1; + INIT_LIST_HEAD(&rset->callback_link); + INIT_WORK(&rset->callback_work, kds_queued_callback); + + mutex_lock(&kds_lock); + + for (i = 0; i < number_resources; i++) + { + unsigned long link_state = 0; + + if (likely(resource_list[i]->lock_count < ULONG_MAX)) + resource_list[i]->lock_count++; + else + break; + + if (test_bit(i, exclusive_access_bitmap)) + { + link_state |= KDS_LINK_EXCLUSIVE; + } + + if (list_empty(&resource_list[i]->waiters.link)) + { + link_state |= KDS_LINK_TRIGGERED; + rset->pending--; + } + /* Adding a non-exclusive and the current tail is a triggered non-exclusive? */ + else if (((link_state & KDS_LINK_EXCLUSIVE) == 0) && + (((list_entry(resource_list[i]->waiters.link.prev, struct kds_link, link)->state & (KDS_LINK_EXCLUSIVE | KDS_LINK_TRIGGERED)) == KDS_LINK_TRIGGERED))) + { + link_state |= KDS_LINK_TRIGGERED; + rset->pending--; + } + + INIT_LIST_HEAD(&rset->resources[i].link); + rset->resources[i].parent = rset; + rset->resources[i].state = link_state; + list_add_tail(&rset->resources[i].link, &resource_list[i]->waiters.link); + } + + if (i < number_resources) + { + /* an overflow was detected, roll back */ + while (i--) + { + list_del(&rset->resources[i].link); + resource_list[i]->lock_count--; + } + mutex_unlock(&kds_lock); + kfree(rset); + return ERR_PTR(-EFAULT); + } + + if (rset->pending == 0) + triggered = 1; + else + { + rset->cb = &sync_cb; + rset->callback_parameter = &wake; + rset->callback_extra_parameter = NULL; + } + + mutex_unlock(&kds_lock); + + if (!triggered) + { + long wait_res; + if ( KDS_WAIT_BLOCKING == jiffies_timeout ) + { + wait_res = wait_event_interruptible(wake, rset->pending == 0); + } + else + { + wait_res = wait_event_interruptible_timeout(wake, rset->pending == 0, jiffies_timeout); + } + if ((wait_res == -ERESTARTSYS) || (wait_res == 0)) + { + /* use \a kds_resource_set_release to roll back */ + kds_resource_set_release(&rset); + return ERR_PTR(wait_res); + } + } + return rset; +} +EXPORT_SYMBOL(kds_waitall); + +void kds_resource_set_release(struct kds_resource_set ** pprset) +{ + struct list_head triggered = LIST_HEAD_INIT(triggered); + struct kds_resource_set * rset; + struct kds_resource_set * it; + int i; + + BUG_ON(!pprset); + + mutex_lock(&kds_lock); + + rset = *pprset; + if (!rset) + { + /* caught a race between a cancelation + * and a completion, nothing to do */ + mutex_unlock(&kds_lock); + return; + } + + /* clear user pointer so we'll be the only + * thread handling the release */ + *pprset = NULL; + + for (i = 0; i < rset->num_resources; i++) + { + struct kds_resource * resource; + struct kds_link * it = NULL; + + /* fetch the previous entry on the linked list */ + it = list_entry(rset->resources[i].link.prev, struct kds_link, link); + /* unlink ourself */ + list_del(&rset->resources[i].link); + + /* any waiters? */ + if (list_empty(&it->link)) + continue; + + /* were we the head of the list? (head if prev is a resource) */ + if (it->parent != KDS_RESOURCE) + continue; + + /* we were the head, find the kds_resource */ + resource = container_of(it, struct kds_resource, waiters); + + if (rset->locked_resources) + { + resource->lock_count--; + } + + /* we know there is someone waiting from the any-waiters test above */ + + /* find the head of the waiting list */ + it = list_first_entry(&resource->waiters.link, struct kds_link, link); + + /* new exclusive owner? */ + if (it->state & KDS_LINK_EXCLUSIVE) + { + /* link now triggered */ + it->state |= KDS_LINK_TRIGGERED; + /* a parent to update? */ + if (it->parent != KDS_IGNORED) + { + if (0 == --it->parent->pending) + { + /* new owner now triggered, track for callback later */ + list_add(&it->parent->callback_link, &triggered); + } + } + } + /* exclusive releasing ? */ + else if (rset->resources[i].state & KDS_LINK_EXCLUSIVE) + { + /* trigger non-exclusive until end-of-list or first exclusive */ + list_for_each_entry(it, &resource->waiters.link, link) + { + /* exclusive found, stop triggering */ + if (it->state & KDS_LINK_EXCLUSIVE) + break; + + it->state |= KDS_LINK_TRIGGERED; + /* a parent to update? */ + if (it->parent != KDS_IGNORED) + { + if (0 == --it->parent->pending) + { + /* new owner now triggered, track for callback later */ + list_add(&it->parent->callback_link, &triggered); + } + } + } + } + + } + + mutex_unlock(&kds_lock); + + while (!list_empty(&triggered)) + { + it = list_first_entry(&triggered, struct kds_resource_set, callback_link); + list_del(&it->callback_link); + kds_callback_perform(it); + } + + cancel_work_sync(&rset->callback_work); + + /* free the resource set */ + kfree(rset); +} +EXPORT_SYMBOL(kds_resource_set_release); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("ARM Ltd."); +MODULE_VERSION("1.0"); diff --git a/include/linux/kds.h b/include/linux/kds.h new file mode 100644 index 0000000..65e5706 --- /dev/null +++ b/include/linux/kds.h @@ -0,0 +1,154 @@ +/* + * + * (C) COPYRIGHT 2012 ARM Limited. All rights reserved. + * + * This program is free software and is provided to you under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation, and any use by you of this program is subject to the terms of such GNU licence. + * + * A copy of the licence is included with the program, and can also be obtained from Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + */ + + + +#ifndef _KDS_H_ +#define _KDS_H_ + +#include <linux/list.h> +#include <linux/workqueue.h> + +#define KDS_WAIT_BLOCKING (ULONG_MAX) + +/* what to do when waitall must wait for a synchronous locked resource: */ +#define KDS_FLAG_LOCKED_FAIL (0u << 0) /* fail waitall */ +#define KDS_FLAG_LOCKED_IGNORE (1u << 0) /* don't wait, but block other that waits */ +#define KDS_FLAG_LOCKED_WAIT (2u << 0) /* wait (normal */ +#define KDS_FLAG_LOCKED_ACTION (3u << 0) /* mask to extract the action to do on locked resources */ + +struct kds_resource_set; + +typedef void (*kds_callback_fn) (void * callback_parameter, void * callback_extra_parameter); + +struct kds_callback +{ + kds_callback_fn user_cb; /* real cb */ + int direct; /* do direct or queued call? */ + struct workqueue_struct * wq; +}; + +struct kds_link +{ + struct kds_resource_set * parent; + struct list_head link; + unsigned long state; +}; + +struct kds_resource +{ + struct kds_link waiters; + unsigned long lock_count; +}; + +/* callback API */ + +/* Initialize a callback object. + * + * Typically created per context or per hw resource. + * + * Callbacks can be performed directly if no nested locking can + * happen in the client. + * + * Nested locking can occur when a lock is held during the kds_async_waitall or + * kds_resource_set_release call. If the callback needs to take the same lock + * nested locking will happen. + * + * If nested locking could happen non-direct callbacks can be requested. + * Callbacks will then be called asynchronous to the triggering call. + */ +int kds_callback_init(struct kds_callback * cb, int direct, kds_callback_fn user_cb); + +/* Terminate the use of a callback object. + * + * If the callback object was set up as non-direct + * any pending callbacks will be flushed first. + * Note that to avoid a deadlock the lock callbacks needs + * can't be held when a callback object is terminated. + */ +void kds_callback_term(struct kds_callback * cb); + + +/* resource object API */ + +/* initialize a resource handle for a shared resource */ +void kds_resource_init(struct kds_resource * resource); + +/* + * Will assert if the resource is being used or waited on. + * The caller should NOT try to terminate a resource that could still have clients. + * After the function returns the resource is no longer known by kds. + */ +void kds_resource_term(struct kds_resource * resource); + +/* Asynchronous wait for a set of resources. + * Callback will be called when all resources are available. + * If all the resources was available the callback will be called before kds_async_waitall returns. + * So one must not hold any locks the callback code-flow can take when calling kds_async_waitall. + * Caller considered to own/use the resources until \a kds_rset_release is called. + * flags is one or more of the KDS_FLAG_* set. + * exclusive_access_bitmap is a bitmap where a high bit means exclusive access while a low bit means shared access. + * Use the Linux __set_bit API, where the index of the buffer to control is used as the bit index. + * + * Standard Linux error return value. + */ +int kds_async_waitall( + struct kds_resource_set ** pprset, + unsigned long flags, + struct kds_callback * cb, + void * callback_parameter, + void * callback_extra_parameter, + int number_resources, + unsigned long * exclusive_access_bitmap, + struct kds_resource ** resource_list); + +/* Synchronous wait for a set of resources. + * Function will return when one of these have happened: + * - all resources have been obtained + * - timeout lapsed while waiting + * - a signal was received while waiting + * + * Caller considered to own/use the resources when the function returns. + * Caller must release the resources using \a kds_rset_release. + * + * Calling this function while holding already locked resources or other locking primitives is dangerous. + * One must if this is needed decide on a lock order of the resources and/or the other locking primitives + * and always take the resources/locking primitives in the specific order. + * + * Use the ERR_PTR framework to decode the return value. + * NULL = time out + * If IS_ERR then PTR_ERR gives: + * ERESTARTSYS = signal received, retry call after signal + * all other values = internal error, lock failed + * Other values = successful wait, now the owner, must call kds_resource_set_release + */ +struct kds_resource_set * kds_waitall( + int number_resources, + unsigned long * exclusive_access_bitmap, + struct kds_resource ** resource_list, + unsigned long jifies_timeout); + +/* Release resources after use. + * Caller must handle that other async callbacks will trigger, + * so must avoid holding any locks a callback will take. + * + * The function takes a pointer to your poiner to handle a race + * between a cancelation and a completion. + * + * If the caller can't guarantee that a race can't occur then + * the passed in pointer must be the same in both call paths + * to allow kds to manage the potential race. + */ +void kds_resource_set_release(struct kds_resource_set ** pprset); + +#endif /* _KDS_H_ */ +

13 years, 6 months

1
0
0 0

[PATCH] dma-buf: fix disabled vmap function

by Dave Airlie

From: Dave Airlie <airlied(a)redhat.com> include/linux/dma-buf.h: In function ‘dma_buf_vmap’: include/linux/dma-buf.h:260:1: warning: no return statement in function returning non-void [-Wreturn-type] Reported-by: wfg(a)linux.intel.com Signed-off-by: Dave Airlie <airlied(a)redhat.com> --- include/linux/dma-buf.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index d8c2865..506bb7b 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -257,6 +257,7 @@ static inline void dma_buf_kunmap(struct dma_buf *dmabuf, static inline void *dma_buf_vmap(struct dma_buf *dmabuf) { + return NULL; } static inline void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr) -- 1.7.6

13 years, 6 months

2
2
0 0

Re: [Linaro-mm-sig] New "xf86-video-armsoc" DDX driver

by Dave Airlie

> > For the last few months we (ARM MPD... "The Mali guys") have been working on > getting X.Org up and running with Mali T6xx (ARM's next-generation GPU IP). > The approach is very similar (well identical I think) to how things work on > OMAP: We use a DRM driver to manage the display controller via KMS. The KMS > driver also allocates both scan-out and pixmap/back buffers via the > DRM_IOCTL_MODE_CREATE_DUMB ioctl which is internally implemented with GEM. > When returning buffers to DRI clients, the x-server uses flink to get a > global handle to a buffer which it passes back to the DRI client (in our > case the Mali-T600 X11 EGL winsys). The client then uses the new PRIME > ioctls to export the GEM buffer it received from the x-server to a dma_buf > fd. This fd is then passed into the T6xx kernel driver via our own job > dispatch user/kernel API (we're not using DRM for driving the GPU, only the > display controller). So using dumb in this was is probably a bit of an abuse, since dumb is defined to provide buffers not to be used for acceleration hw. Since when we allocate dumb buffers, we can't know what special hw layouts are required (tiling etc) for optimal performance for accel. The logic to work that out is rarely generic. > > http://git.linaro.org/gitweb?p=arm/xorg/driver/xf86-video-armsoc.git;a=summa > ry > > Note: When we originally spoke to Rob Clark about this, he suggested we take > the already-generic xf86-video-modesetting and just add the dri2 code to it. > This is indeed how we started out, however as we progressed it became clear > that the majority of the code we wanted was in the omap driver and were > having to work fairly hard to keep some of the original modesetting code. > This is why we've now changed tactic and just forked the OMAP driver, > something Rob is more than happy for us to do. It does seem like porting to -modesetting, and maybe cleaning up modesetting if its needs it. The modesetting driver is pretty much just a make it work port of the radeon/nouveau/intel code "shared" code. > One thing the DDX driver isn't doing yet is making use of 2D hw blocks. In > the short-term, we will simply create a branch off of the "generic" master > for each SoC and add 2D hardware support there. We do however want a more > permanent solution which doesn't need a separate branch per SoC. Some of the > suggested solutions are: > > * Add a new generic DRM ioctl API for larger 2D operations (I would imagine > small blits/blends would be done in SW). Not going to happen, again the hw isn't generic in this area, some hw requires 3D engines to do 2D ops etc. The limitations on some hw with overlaps etc, and finally it breaks the rule about generic ioctls for acceleration operations. > * Use SW rendering for everything other than solid blits and use v4l2's > blitting API for those (importing/exporting buffers to be blitted using > dma_buf). The theory here is that most UIs are rendered with GLES and so you > only need 2D hardware for blits. I think we'll prototype this approach on > Exynos. Seems a bit over the top, > * Define a new x-server sub-module interface to allow a seperate .so 2D > driver to be loaded (this is the approach the current OMAP DDX uses). This seems the sanest. I haven't time this week to review the code, but I'll try and take a look when time permits. Dave.

13 years, 6 months

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig