Add support for the dma-buf exporter role to the frame buffer API. The
importer role isn't meaningful for frame buffer devices, as the frame
buffer device model doesn't allow using externally allocated memory.
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
---
Documentation/fb/api.txt | 36 ++++++++++++++++++++++++++++++++++++
drivers/video/fbmem.c | 36 ++++++++++++++++++++++++++++++++++++
include/linux/fb.h | 12 ++++++++++++
3 files changed, 84 insertions(+), 0 deletions(-)
diff --git a/Documentation/fb/api.txt b/Documentation/fb/api.txt
index d4ff7de..f0b2173 100644
--- a/Documentation/fb/api.txt
+++ b/Documentation/fb/api.txt
@@ -304,3 +304,39 @@ extensions.
Upon successful format configuration, drivers update the fb_fix_screeninfo
type, visual and line_length fields depending on the selected format. The type
and visual fields are set to FB_TYPE_FOURCC and FB_VISUAL_FOURCC respectively.
+
+
+5. DMA buffer sharing
+---------------------
+
+The dma-buf kernel framework allows DMA buffers to be shared across devices
+and applications. Sharing buffers across display devices and video capture or
+video decoding devices allow zero-copy operation when displaying video content
+produced by a hardware device such as a camera or a hardware codec. This is
+crucial to achieve optimal system performances during video display.
+
+While dma-buf supports both exporting internally allocated memory as a dma-buf
+object (known as the exporter role) and importing a dma-buf object to be used
+as device memory (known as the importer role), the frame buffer API only
+supports the exporter role, as the frame buffer device model doesn't support
+using externally-allocated memory.
+
+The export a frame buffer as a dma-buf file descriptors, applications call the
+FBIOGET_DMABUF ioctl. The ioctl takes a pointer to a fb_dmabuf_export
+structure.
+
+struct fb_dmabuf_export {
+ __u32 fd;
+ __u32 flags;
+};
+
+The flag field specifies the flags to be used when creating the dma-buf file
+descriptor. The only supported flag is O_CLOEXEC. If the call is successful,
+the driver will set the fd field to a file descriptor corresponding to the
+dma-buf object.
+
+Applications can then pass the file descriptors to another application or
+another device driver. The dma-buf object is automatically reference-counted,
+applications can and should close the file descriptor as soon as they don't
+need it anymore. The underlying dma-buf object will not be freed before the
+last device that uses the dma-buf object releases it.
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
index 0dff12a..400e449 100644
--- a/drivers/video/fbmem.c
+++ b/drivers/video/fbmem.c
@@ -15,6 +15,7 @@
#include <linux/compat.h>
#include <linux/types.h>
+#include <linux/dma-buf.h>
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/major.h>
@@ -1074,6 +1075,23 @@ fb_blank(struct fb_info *info, int blank)
return ret;
}
+#ifdef CONFIG_DMA_SHARED_BUFFER
+int
+fb_get_dmabuf(struct fb_info *info, int flags)
+{
+ struct dma_buf *dmabuf;
+
+ if (info->fbops->fb_dmabuf_export == NULL)
+ return -ENOTTY;
+
+ dmabuf = info->fbops->fb_dmabuf_export(info);
+ if (IS_ERR(dmabuf))
+ return PTR_ERR(dmabuf);
+
+ return dma_buf_fd(dmabuf, flags);
+}
+#endif
+
static long do_fb_ioctl(struct fb_info *info, unsigned int cmd,
unsigned long arg)
{
@@ -1084,6 +1102,7 @@ static long do_fb_ioctl(struct fb_info *info, unsigned int cmd,
struct fb_cmap cmap_from;
struct fb_cmap_user cmap;
struct fb_event event;
+ struct fb_dmabuf_export dmaexp;
void __user *argp = (void __user *)arg;
long ret = 0;
@@ -1191,6 +1210,23 @@ static long do_fb_ioctl(struct fb_info *info, unsigned int cmd,
console_unlock();
unlock_fb_info(info);
break;
+#ifdef CONFIG_DMA_SHARED_BUFFER
+ case FBIOGET_DMABUF:
+ if (copy_from_user(&dmaexp, argp, sizeof(dmaexp)))
+ return -EFAULT;
+
+ if (!lock_fb_info(info))
+ return -ENODEV;
+ dmaexp.fd = fb_get_dmabuf(info, dmaexp.flags);
+ unlock_fb_info(info);
+
+ if (dmaexp.fd < 0)
+ return dmaexp.fd;
+
+ ret = copy_to_user(argp, &dmaexp, sizeof(dmaexp))
+ ? -EFAULT : 0;
+ break;
+#endif
default:
if (!lock_fb_info(info))
return -ENODEV;
diff --git a/include/linux/fb.h b/include/linux/fb.h
index ac3f1c6..c9fee75 100644
--- a/include/linux/fb.h
+++ b/include/linux/fb.h
@@ -39,6 +39,7 @@
#define FBIOPUT_MODEINFO 0x4617
#define FBIOGET_DISPINFO 0x4618
#define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32)
+#define FBIOGET_DMABUF _IOR('F', 0x21, struct fb_dmabuf_export)
#define FB_TYPE_PACKED_PIXELS 0 /* Packed Pixels */
#define FB_TYPE_PLANES 1 /* Non interleaved planes */
@@ -403,6 +404,11 @@ struct fb_cursor {
#define FB_BACKLIGHT_MAX 0xFF
#endif
+struct fb_dmabuf_export {
+ __u32 fd;
+ __u32 flags;
+};
+
#ifdef __KERNEL__
#include <linux/fs.h>
@@ -418,6 +424,7 @@ struct vm_area_struct;
struct fb_info;
struct device;
struct file;
+struct dma_buf;
/* Definitions below are used in the parsed monitor specs */
#define FB_DPMS_ACTIVE_OFF 1
@@ -701,6 +708,11 @@ struct fb_ops {
/* called at KDB enter and leave time to prepare the console */
int (*fb_debug_enter)(struct fb_info *info);
int (*fb_debug_leave)(struct fb_info *info);
+
+#ifdef CONFIG_DMA_SHARED_BUFFER
+ /* Export the frame buffer as a dmabuf object */
+ struct dma_buf *(*fb_dmabuf_export)(struct fb_info *info);
+#endif
};
#ifdef CONFIG_FB_TILEBLITTING
--
Regards,
Laurent Pinchart
Hello,
This is a continuation of the dma-mapping extensions posted in the
following thread:
http://thread.gmane.org/gmane.linux.kernel.mm/78644
We noticed that some advanced buffer sharing use cases usually require
creating a dma mapping for the same memory buffer for more than one
device. Usually also such buffer is never touched with CPU, so the data
are processed by the devices.
>From the DMA-mapping perspective this requires to call one of the
dma_map_{page,single,sg} function for the given memory buffer a few
times, for each of the devices. Each dma_map_* call performs CPU cache
synchronization, what might be a time consuming operation, especially
when the buffers are large. We would like to avoid any useless and time
consuming operations, so that was the main reason for introducing
another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
which lets dma-mapping core to skip CPU cache synchronization in certain
cases.
The proposed patches have been generated on top of the ARM DMA-mapping
redesign patch series on Linux v3.4-rc7. They are also available on the
following GIT branch:
git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext
with all require patches on top of vanilla v3.4-rc7 kernel. I will
resend them rebased onto v3.5-rc1 soon.
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (2):
common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
Documentation/DMA-attributes.txt | 24 ++++++++++++++++++++++++
arch/arm/mm/dma-mapping.c | 20 +++++++++++---------
include/linux/dma-attrs.h | 1 +
3 files changed, 36 insertions(+), 9 deletions(-)
--
1.7.1.569.g6f426
From: Benjamin Gaignard <benjamin.gaignard(a)linaro.org>
The goal of those patches is to allow ION clients (drivers or userland applications)
to use Contiguous Memory Allocator (CMA).
To get more info about CMA:
http://lists.linaro.org/pipermail/linaro-mm-sig/2012-February/001328.html
patches version 5:
- port patches on android kernel 3.4 where ION use dmabuf
- add ion_cma_heap_map_dma and ion_cma_heap_unmap_dma functions
patches version 4:
- add ION_HEAP_TYPE_DMA heap type in ion_heap_type enum.
- CMA heap is now a "native" ION heap.
- add ion_heap_create_full function to keep backward compatibilty.
- clean up included files in CMA heap
- ux500-ion is using ion_heap_create_full instead of ion_heap_create
patches version 3:
- add a private field in ion_heap structure instead of expose ion_device
structure to all heaps
- ion_cma_heap is no more a platform driver
- ion_cma_heap use ion_heap private field to store the device pointer and
make the link with reserved CMA regions
- provide ux500-ion driver and configuration file for snowball board to give
an example of how use CMA heaps
patches version 2:
- fix comments done by Andy Green
Benjamin Gaignard (4):
fix ion_platform_data definition
add private field in ion_heap structure
add CMA heap
add test/example driver for ux500 platform
arch/arm/mach-ux500/board-mop500.c | 77 ++++++++++++++++
drivers/gpu/ion/Kconfig | 5 ++
drivers/gpu/ion/Makefile | 5 +-
drivers/gpu/ion/ion_cma_heap.c | 175 ++++++++++++++++++++++++++++++++++++
drivers/gpu/ion/ion_heap.c | 18 +++-
drivers/gpu/ion/ion_priv.h | 13 +++
drivers/gpu/ion/ux500/Makefile | 1 +
drivers/gpu/ion/ux500/ux500_ion.c | 142 +++++++++++++++++++++++++++++
include/linux/ion.h | 5 +-
9 files changed, 438 insertions(+), 3 deletions(-)
create mode 100644 drivers/gpu/ion/ion_cma_heap.c
create mode 100644 drivers/gpu/ion/ux500/Makefile
create mode 100644 drivers/gpu/ion/ux500/ux500_ion.c
--
1.7.10
Hi Linus,
I would like to ask for pulling a set of minor fixes for dma-mapping
code (ARM and x86) required for Contiguous Memory Allocator (CMA)
patches merged in v3.5-rc1.
The following changes since commit cfaf025112d3856637ff34a767ef785ef5cf2ca9:
Linux 3.5-rc2 (2012-06-08 18:40:09 -0700)
with the top-most commit c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae
x86: dma-mapping: fix broken allocation when dma_mask has been provided
are available in the git repository at:
git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git fixes-for-linus
Marek Szyprowski (3):
ARM: mm: fix type of the arm_dma_limit global variable
ARM: dma-mapping: fix debug messages in dmabounce code
x86: dma-mapping: fix broken allocation when dma_mask has been provided
Sachin Kamat (1):
ARM: dma-mapping: Add missing static storage class specifier
arch/arm/common/dmabounce.c | 16 ++++++++--------
arch/arm/mm/dma-mapping.c | 4 ++--
arch/arm/mm/init.c | 2 +-
arch/arm/mm/mm.h | 2 +-
arch/x86/kernel/pci-dma.c | 3 ++-
5 files changed, 14 insertions(+), 13 deletions(-)
Thanks!
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Hello,
This is an updated version of the patch series introducing a new
features to DMA mapping subsystem to let drivers share the allocated
buffers (preferably using recently introduced dma_buf framework) easy
and efficient.
The first extension is DMA_ATTR_NO_KERNEL_MAPPING attribute. It is
intended for use with dma_{alloc, mmap, free}_attrs functions. It can be
used to notify dma-mapping core that the driver will not use kernel
mapping for the allocated buffer at all, so the core can skip creating
it. This saves precious kernel virtual address space. Such buffer can be
accessed from userspace, after calling dma_mmap_attrs() for it (a
typical use case for multimedia buffers). The value returned by
dma_alloc_attrs() with this attribute should be considered as a DMA
cookie, which needs to be passed to dma_mmap_attrs() and
dma_free_attrs() funtions.
The second extension is required to let drivers to share the buffers
allocated by DMA-mapping subsystem. Right now the driver gets a dma
address of the allocated buffer and the kernel virtual mapping for it.
If it wants to share it with other device (= map into its dma address
space) it usually hacks around kernel virtual addresses to get pointers
to pages or assumes that both devices share the DMA address space. Both
solutions are just hacks for the special cases, which should be avoided
in the final version of buffer sharing. To solve this issue in a generic
way, a new call to DMA mapping has been introduced - dma_get_sgtable().
It allocates a scatter-list which describes the allocated buffer and
lets the driver(s) to use it with other device(s) by calling
dma_map_sg() on it.
The third extension solves the performance issues which we observed with
some advanced buffer sharing use cases, which require creating a dma
mapping for the same memory buffer for more than one device. From the
DMA-mapping perspective this requires to call one of the
dma_map_{page,single,sg} function for the given memory buffer a few
times, for each of the devices. Each dma_map_* call performs CPU cache
synchronization, what might be a time consuming operation, especially
when the buffers are large. We would like to avoid any useless and time
consuming operations, so that was the main reason for introducing
another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
which lets dma-mapping core to skip CPU cache synchronization in certain
cases.
The proposed patches have been rebased on the latest Linux kernel
v3.5-rc2 with 'ARM: replace custom consistent dma region with vmalloc'
patches applied (for more information, please refer to the
http://www.spinics.net/lists/arm-kernel/msg179202.html thread).
The patches together with all dependences are also available on the
following GIT branch:
git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.5-rc2-dma-ext-v2
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Changelog:
v2:
- rebased onto v3.5-rc2 and adapted for CMA and dma-mapping changes
- renamed dma_get_sgtable() to dma_get_sgtable_attrs() to match the convention
of the other dma-mapping calls with attributes
- added generic fallback function for dma_get_sgtable() for architectures with
simple dma-mapping implementations
v1: http://thread.gmane.org/gmane.linux.kernel.mm/78644http://thread.gmane.org/gmane.linux.kernel.cross-arch/14435 (part 2)
- initial version
Patch summary:
Marek Szyprowski (6):
common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute
ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING
attribute
common: dma-mapping: introduce dma_get_sgtable() function
ARM: dma-mapping: add support for dma_get_sgtable()
common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
Documentation/DMA-attributes.txt | 42 ++++++++++++++++++
arch/arm/common/dmabounce.c | 1 +
arch/arm/include/asm/dma-mapping.h | 3 +
arch/arm/mm/dma-mapping.c | 69 ++++++++++++++++++++++++------
drivers/base/dma-mapping.c | 18 ++++++++
include/asm-generic/dma-mapping-common.h | 18 ++++++++
include/linux/dma-attrs.h | 2 +
include/linux/dma-mapping.h | 3 +
8 files changed, 142 insertions(+), 14 deletions(-)
--
1.7.1.569.g6f426
Currently, when freeing 0 order pages, CMA pages are treated
the same as regular movable pages, which means they end up
on the per-cpu page list. This means that the CMA pages are
likely to be allocated for something other than contigous
memory. This increases the chance that the next alloc_contig_range
will fail because pages can't be migrated.
Given the size of the CMA region is typically limited, it is best to
optimize for success of alloc_contig_range as much as possible.
Do this by freeing CMA pages directly instead of putting them
on the per-cpu page lists.
Signed-off-by: Laura Abbott <lauraa(a)codeaurora.org>
---
mm/page_alloc.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e1c6f5..c9a6483 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1310,7 +1310,8 @@ void free_hot_cold_page(struct page *page, int cold)
* excessively into the page allocator
*/
if (migratetype >= MIGRATE_PCPTYPES) {
- if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+ if (unlikely(migratetype == MIGRATE_ISOLATE)
+ || is_migrate_cma(migratetype)) {
free_one_page(zone, page, 0, migratetype);
goto out;
}
--
1.7.8.3
Hello everyone,
The patches adds support for DMABUF exporting to V4L2 stack. The latest
support for DMABUF importing was posted in [1]. The exporter part is dependant
on DMA mapping redesign [2] which is not merged into the mainline. Therefore it
is posted as a separate patchset. Moreover some patches depends on vmap
extension for DMABUF by Dave Airlie [3] and sg_alloc_table_from_pages function
[4].
Changelog:
v0: (RFC)
- updated setup of VIDIOC_EXPBUF ioctl
- doc updates
- introduced workaround to avoid using dma_get_pages,
- removed caching of exported dmabuf to avoid existence of circular reference
between dmabuf and vb2_dc_buf or resource leakage
- removed all 'change behaviour' patches
- inital support for exporting in s5p-mfs driver
- removal of vb2_mmap_pfn_range that is no longer used
- use sg_alloc_table_from_pages instead of creating sglist in vb2_dc code
- move attachment allocation to exporter's attach callback
[1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/48730
[2] http://thread.gmane.org/gmane.linux.kernel.cross-arch/14098
[3] http://permalink.gmane.org/gmane.comp.video.dri.devel/69302
[4] This patchset is rebased on 3.4-rc1 plus the following patchsets:
Marek Szyprowski (1):
v4l: vb2-dma-contig: let mmap method to use dma_mmap_coherent call
Tomasz Stanislawski (11):
v4l: add buffer exporting via dmabuf
v4l: vb2: add buffer exporting via dmabuf
v4l: vb2-dma-contig: add setup of sglist for MMAP buffers
v4l: vb2-dma-contig: add support for DMABUF exporting
v4l: vb2-dma-contig: add vmap/kmap for dmabuf exporting
v4l: s5p-fimc: support for dmabuf exporting
v4l: s5p-tv: mixer: support for dmabuf exporting
v4l: s5p-mfc: support for dmabuf exporting
v4l: vb2: remove vb2_mmap_pfn_range function
v4l: vb2-dma-contig: use sg_alloc_table_from_pages function
v4l: vb2-dma-contig: Move allocation of dbuf attachment to attach cb
drivers/media/video/s5p-fimc/fimc-capture.c | 9 +
drivers/media/video/s5p-mfc/s5p_mfc_dec.c | 13 ++
drivers/media/video/s5p-mfc/s5p_mfc_enc.c | 13 ++
drivers/media/video/s5p-tv/mixer_video.c | 10 +
drivers/media/video/v4l2-compat-ioctl32.c | 1 +
drivers/media/video/v4l2-dev.c | 1 +
drivers/media/video/v4l2-ioctl.c | 6 +
drivers/media/video/videobuf2-core.c | 67 ++++++
drivers/media/video/videobuf2-dma-contig.c | 323 ++++++++++++++++++++++-----
drivers/media/video/videobuf2-memops.c | 40 ----
include/linux/videodev2.h | 26 +++
include/media/v4l2-ioctl.h | 2 +
include/media/videobuf2-core.h | 2 +
include/media/videobuf2-memops.h | 5 -
14 files changed, 411 insertions(+), 107 deletions(-)
--
1.7.9.5
On Thu, Jun 7, 2012 at 4:35 AM, Tom Cooksey <tom.cooksey(a)arm.com> wrote:
> The alternate is to not associate sync objects with buffers and
> have them be distinct entities, exposed to userspace. This gives
> userpsace more power and flexibility and might allow for use-cases
> which an implicit synchronization mechanism can't satisfy - I'd
> be curious to know any specifics here.
Time and time again we've had problems with implicit synchronization
resulting in bugs where different drivers play by slightly different
implicit rules. We're convinced the best way to attack this problem
is to move as much of the command and control of synchronization as
possible into a single piece of code (the compositor in our case.) To
facilitate this we're going to be mandating this explicit approach in
the K release of Android.
> However, every driver which
> needs to participate in the synchronization mechanism will need
> to have its interface with userspace modified to allow the sync
> objects to be passed to the drivers. This seemed like a lot of
> work to me, which is why I prefer the implicit approach. However
> I don't actually know what work is needed and think it should be
> explored. I.e. How much work is it to add explicit sync object
> support to the DRM & v4l2 interfaces?
>
> E.g. I believe DRM/GEM's job dispatch API is "in-order"
> in which case it might be easy to just add "wait for this fence"
> and "signal this fence" ioctls. Seems like vmwgfx already has
> something similar to this already? Could this work over having
> to specify a list of sync objects to wait on and another list
> of sync objects to signal for every operation (exec buf/page
> flip)? What about for v4l2?
If I understand you right a job submission with explicit sync would
become 3 submission:
1) submit wait for pre-req fence job
2) submit render job
3) submit signal ready fence job
Does DRM provide a way to ensure these 3 jobs are submitted
atomically? I also expect GPU vendor would like to get clever about
GPU to GPU fence dependancies. That could probably be handled
entirely in the userspace GL driver.
> I guess my other thought is that implicit vs explicit is not
> mutually exclusive, though I'd guess there'd be interesting
> deadlocks to have to debug if both were in use _at the same
> time_. :-)
I think this is an approach worth investigating. I'd like a way to
either opt out of implicit sync or have a way to check if a dma-buf
has an attached fence and detach it. Actually, that could work really
well. Consider:
* Each dma_buf has a single fence "slot"
* on submission
* the driver will extract the fence from the dma_buf and queue a wait on it.
* the driver will replace that fence with it's own complettion
fence before the job submission ioctl returns.
* dma_buf will have two userspace ioctls:
* DETACH: will return the fence as an FD to userspace and clear the
fence slot in the dma_buf
* ATTACH: takes a fence FD from userspace and attaches it to the
dma_buf fence slot. Returns an error if the fence slot is non-empty.
In the android case, we can do a detach after every submission and an
attach right before.
-Erik
Hey Erik,
Op 07-06-12 19:35, Erik Gilling schreef:
> On Thu, Jun 7, 2012 at 1:55 AM, Maarten Lankhorst
> <m.b.lankhorst(a)gmail.com> wrote:
>> I haven't looked at intel and amd, but from a quick glance
>> it seems like they already implement fencing too, so just
>> some way to synch up the fences on shared buffers seems
>> like it could benefit all graphics drivers and the whole
>> userspace synching could be done away with entirely.
> It's important to have some level of userspace API so that GPU
> generated graphics can participate in the graphics pipeline. Think of
> the case where you have a software video codec streaming textures into
> the GPU. It needs to know when the GPU is done with those textures so
> it can reuse the buffer.
>
In the graphics case this problem already has to be handled without
dma-buf, so adding any extra synchronization api for userspace
that is only used when the bo is shared is a waste.
I do agree you need some way to synch userspace though, but I
think adding a new api for userspace is not the way to go.
Cheers,
Maarten
PS: re-added cc's that seem to have fallen off from your mail.