Hi,
I did a backport of the Contiguous Memory Allocator to a 3.0.8 tree. I
wrote fairly simple test case that, in 1MB chunks, allocs up to 40MB
from a reserved area, maps, writes, unmaps and then frees in an infinite
loop. When running this with another program in parallel to put some
stress on the filesystem, I hit data aborts in the filesystem/journal
layer, although not always the same backtrace. As an example:
[<c02907a4>] (__ext4_check_dir_entry+0x20/0x184) from [<c029e1a8>]
(add_dirent_to_buf+0x70/0x2ac)
[<c029e1a8>] (add_dirent_to_buf+0x70/0x2ac) from [<c029f3f0>]
(ext4_add_entry+0xd8/0x4bc)
[<c029f3f0>] (ext4_add_entry+0xd8/0x4bc) from [<c029fe90>]
(ext4_add_nondir+0x14/0x64)
[<c029fe90>] (ext4_add_nondir+0x14/0x64) from [<c02a04c4>]
(ext4_create+0xd8/0x120)
[<c02a04c4>] (ext4_create+0xd8/0x120) from [<c022e134>]
(vfs_create+0x74/0xa4)
[<c022e134>] (vfs_create+0x74/0xa4) from [<c022ed3c>] (do_last+0x588/0x8d4)
[<c022ed3c>] (do_last+0x588/0x8d4) from [<c022fe64>]
(path_openat+0xc4/0x394)
[<c022fe64>] (path_openat+0xc4/0x394) from [<c0230214>]
(do_filp_open+0x30/0x7c)
[<c0230214>] (do_filp_open+0x30/0x7c) from [<c0220cb4>]
(do_sys_open+0xd8/0x174)
[<c0220cb4>] (do_sys_open+0xd8/0x174) from [<c0105ea0>]
(ret_fast_syscall+0x0/0x30)
Every panic had the same issue where a struct buffer_head [1] had a
b_data that was unexpectedly NULL.
During the course of CMA, buffer_migrate_page could be called to migrate
from a CMA page to a new page. buffer_migrate_page calls set_bh_page[2]
to set the new page for the buffer_head. If the new page is a highmem
page though, the bh->b_data ends up as NULL, which could produce the
panics seen above.
This seems to indicate that highmem pages are not not appropriate for
use as pages to migrate to. The following made the problem go away for me:
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5753,7 +5753,7 @@ static struct page *
__alloc_contig_migrate_alloc(struct page *page, unsigned long private,
int **resultp)
{
- return alloc_page(GFP_HIGHUSER_MOVABLE);
+ return alloc_page(GFP_USER | __GFP_MOVABLE);
}
Does this seem like an actual issue or is this an artifact of my
backport to 3.0? I'm not familiar enough with the filesystem layer to be
able to tell where highmem can actually be used.
Thanks,
Laura
[1] http://lxr.free-electrons.com/source/include/linux/buffer_head.h#L59
[2] http://lxr.free-electrons.com/source/fs/buffer.c?v=3.0#L1441
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
Hello everyone,
The patches adds support for DMABUF exporting to V4L2 stack. The latest
support for DMABUF importing was posted in [1]. The exporter part is dependant
on DMA mapping redesign [2] which is not merged into the mainline. Therefore it
is posted as a separate patchset. Moreover some patches depends on vmap
extension for DMABUF by Dave Airlie [3] and sg_alloc_table_from_pages function
[4].
Changelog:
v0: (RFC)
- updated setup of VIDIOC_EXPBUF ioctl
- doc updates
- introduced workaround to avoid using dma_get_pages,
- removed caching of exported dmabuf to avoid existence of circular reference
between dmabuf and vb2_dc_buf or resource leakage
- removed all 'change behaviour' patches
- inital support for exporting in s5p-mfs driver
- removal of vb2_mmap_pfn_range that is no longer used
- use sg_alloc_table_from_pages instead of creating sglist in vb2_dc code
- move attachment allocation to exporter's attach callback
[1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/48730
[2] http://thread.gmane.org/gmane.linux.kernel.cross-arch/14098
[3] http://permalink.gmane.org/gmane.comp.video.dri.devel/69302
[4] This patchset is rebased on 3.4-rc1 plus the following patchsets:
Marek Szyprowski (1):
v4l: vb2-dma-contig: let mmap method to use dma_mmap_coherent call
Tomasz Stanislawski (11):
v4l: add buffer exporting via dmabuf
v4l: vb2: add buffer exporting via dmabuf
v4l: vb2-dma-contig: add setup of sglist for MMAP buffers
v4l: vb2-dma-contig: add support for DMABUF exporting
v4l: vb2-dma-contig: add vmap/kmap for dmabuf exporting
v4l: s5p-fimc: support for dmabuf exporting
v4l: s5p-tv: mixer: support for dmabuf exporting
v4l: s5p-mfc: support for dmabuf exporting
v4l: vb2: remove vb2_mmap_pfn_range function
v4l: vb2-dma-contig: use sg_alloc_table_from_pages function
v4l: vb2-dma-contig: Move allocation of dbuf attachment to attach cb
drivers/media/video/s5p-fimc/fimc-capture.c | 9 +
drivers/media/video/s5p-mfc/s5p_mfc_dec.c | 13 ++
drivers/media/video/s5p-mfc/s5p_mfc_enc.c | 13 ++
drivers/media/video/s5p-tv/mixer_video.c | 10 +
drivers/media/video/v4l2-compat-ioctl32.c | 1 +
drivers/media/video/v4l2-dev.c | 1 +
drivers/media/video/v4l2-ioctl.c | 6 +
drivers/media/video/videobuf2-core.c | 67 ++++++
drivers/media/video/videobuf2-dma-contig.c | 323 ++++++++++++++++++++++-----
drivers/media/video/videobuf2-memops.c | 40 ----
include/linux/videodev2.h | 26 +++
include/media/v4l2-ioctl.h | 2 +
include/media/videobuf2-core.h | 2 +
include/media/videobuf2-memops.h | 5 -
14 files changed, 411 insertions(+), 107 deletions(-)
--
1.7.9.5
Hello everyone,
This patchset adds support for DMABUF [2] importing to V4L2 stack.
The support for DMABUF exporting was moved to separate patchset
due to dependency on patches for DMA mapping redesign by
Marek Szyprowski [4].
v6:
- fixed missing entry in v4l2_memory_names
- fixed a bug occuring after get_user_pages failure
- fixed a bug caused by using invalid vma for get_user_pages
- prepare/finish no longer call dma_sync for dmabuf buffers
v5:
- removed change of importer/exporter behaviour
- fixes vb2_dc_pages_to_sgt basing on Laurent's hints
- changed pin/unpin words to lock/unlock in Doc
v4:
- rebased on mainline 3.4-rc2
- included missing importing support for s5p-fimc and s5p-tv
- added patch for changing map/unmap for importers
- fixes to Documentation part
- coding style fixes
- pairing {map/unmap}_dmabuf in vb2-core
- fixing variable types and semantic of arguments in videobufb2-dma-contig.c
v3:
- rebased on mainline 3.4-rc1
- split 'code refactor' patch to multiple smaller patches
- squashed fixes to Sumit's patches
- patchset is no longer dependant on 'DMA mapping redesign'
- separated path for handling IO and non-IO mappings
- add documentation for DMABUF importing to V4L
- removed all DMABUF exporter related code
- removed usage of dma_get_pages extension
v2:
- extended VIDIOC_EXPBUF argument from integer memoffset to struct
v4l2_exportbuffer
- added patch that breaks DMABUF spec on (un)map_atachment callcacks but allows
to work with existing implementation of DMABUF prime in DRM
- all dma-contig code refactoring patches were squashed
- bugfixes
v1: List of changes since [1].
- support for DMA api extension dma_get_pages, the function is used to retrieve
pages used to create DMA mapping.
- small fixes/code cleanup to videobuf2
- added prepare and finish callbacks to vb2 allocators, it is used keep
consistency between dma-cpu acess to the memory (by Marek Szyprowski)
- support for exporting of DMABUF buffer in V4L2 and Videobuf2, originated from
[3].
- support for dma-buf exporting in vb2-dma-contig allocator
- support for DMABUF for s5p-tv and s5p-fimc (capture interface) drivers,
originated from [3]
- changed handling for userptr buffers (by Marek Szyprowski, Andrzej
Pietrasiewicz)
- let mmap method to use dma_mmap_writecombine call (by Marek Szyprowski)
[1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/4296…
[2] https://lkml.org/lkml/2011/12/26/29
[3] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/3635…
[4] http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819
Laurent Pinchart (2):
v4l: vb2-dma-contig: Shorten vb2_dma_contig prefix to vb2_dc
v4l: vb2-dma-contig: Reorder functions
Marek Szyprowski (2):
v4l: vb2: add prepare/finish callbacks to allocators
v4l: vb2-dma-contig: add prepare/finish to dma-contig allocator
Sumit Semwal (4):
v4l: Add DMABUF as a memory type
v4l: vb2: add support for shared buffer (dma_buf)
v4l: vb: remove warnings about MEMORY_DMABUF
v4l: vb2-dma-contig: add support for dma_buf importing
Tomasz Stanislawski (5):
Documentation: media: description of DMABUF importing in V4L2
v4l: vb2-dma-contig: Remove unneeded allocation context structure
v4l: vb2-dma-contig: add support for scatterlist in userptr mode
v4l: s5p-tv: mixer: support for dmabuf importing
v4l: s5p-fimc: support for dmabuf importing
Documentation/DocBook/media/v4l/compat.xml | 4 +
Documentation/DocBook/media/v4l/io.xml | 179 +++++++
.../DocBook/media/v4l/vidioc-create-bufs.xml | 1 +
Documentation/DocBook/media/v4l/vidioc-qbuf.xml | 15 +
Documentation/DocBook/media/v4l/vidioc-reqbufs.xml | 45 +-
drivers/media/video/s5p-fimc/Kconfig | 1 +
drivers/media/video/s5p-fimc/fimc-capture.c | 2 +-
drivers/media/video/s5p-tv/Kconfig | 1 +
drivers/media/video/s5p-tv/mixer_video.c | 2 +-
drivers/media/video/v4l2-ioctl.c | 1 +
drivers/media/video/videobuf-core.c | 4 +
drivers/media/video/videobuf2-core.c | 207 +++++++-
drivers/media/video/videobuf2-dma-contig.c | 520 +++++++++++++++++---
include/linux/videodev2.h | 7 +
include/media/videobuf2-core.h | 34 ++
15 files changed, 924 insertions(+), 99 deletions(-)
--
1.7.9.5
Hi Linus,
I would like to ask for pulling Contiguous Memory Allocator (CMA) and
ARM DMA-mapping framework updates for v3.5.
The following changes since commit 76e10d158efb6d4516018846f60c2ab5501900bc:
Linux 3.4 (2012-05-20 15:29:13 -0700)
with the top-most commit 0f51596bd39a5c928307ffcffc9ba07f90f42a8b
Merge branch 'for-next-arm-dma' into for-linus
are available in the git repository at:
git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git for-linus
These patches contains 2 major updates for DMA mapping subsystem (mainly
for ARM architecture). First one is Contiguous Memory Allocator (CMA)
which makes it possible for device drivers to allocate big contiguous
chunks of memory after the system has booted.
The main difference from the similar frameworks is the fact that CMA
allows to transparently reuse memory region reserved for the big chunk
allocation as a system memory, so no memory is wasted when no big chunk
is allocated. Once the alloc request is issued, the framework migrates
system pages to create a space for the required big chunk of physically
contiguous memory.
For more information one can refer to nice LWN articles:
'A reworked contiguous memory allocator': http://lwn.net/Articles/447405/
'CMA and ARM': http://lwn.net/Articles/450286/
'A deep dive into CMA': http://lwn.net/Articles/486301/
and the following thread with the patches and links to all previous versions:
https://lkml.org/lkml/2012/4/3/204
The main client for this new framework is ARM DMA-mapping subsystem.
The second part provides a complete redesign in ARM DMA-mapping
subsystem. The core implementation has been changed to use common struct
dma_map_ops based infrastructure with the recent updates for new dma
attributes merged in v3.4-rc2. This allows to use more than one
implementation of dma-mapping calls and change/select them on the struct
device basis. The first client of this new infractructure is dmabounce
implementation which has been completely cut out of the core, common
code.
The last patch of this redesign update introduces a new, experimental
implementation of dma-mapping calls on top of generic IOMMU framework.
This lets ARM sub-platform to transparently use IOMMU for DMA-mapping
calls if one provides required IOMMU hardware.
For more information please refer to the following thread:
http://www.spinics.net/lists/arm-kernel/msg175729.html
The last patch merges changes from both updates and provides a
resolution for the conflicts which cannot be avoided when patches have
been applied on the same files (mainly arch/arm/mm/dma-mapping.c).
Thanks!
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (17):
common: add dma_mmap_from_coherent() function
ARM: dma-mapping: use dma_mmap_from_coherent()
ARM: dma-mapping: use pr_* instread of printk
ARM: dma-mapping: introduce DMA_ERROR_CODE constant
ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
ARM: dma-mapping: use asm-generic/dma-mapping-common.h
ARM: dma-mapping: implement dma sg methods on top of any generic dma ops
ARM: dma-mapping: move all dma bounce code to separate dma ops structure
ARM: dma-mapping: remove redundant code and do the cleanup
ARM: dma-mapping: use alloc, mmap, free from dma_ops
ARM: dma-mapping: add support for IOMMU mapper
mm: extract reclaim code from __alloc_pages_direct_reclaim()
mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks
drivers: add Contiguous Memory Allocator
X86: integrate CMA with DMA-mapping subsystem
ARM: integrate CMA with DMA-mapping subsystem
Merge branch 'for-next-arm-dma' into for-linus
Mel Gorman (1):
mm: Serialize access to min_free_kbytes
Michal Nazarewicz (9):
mm: page_alloc: remove trailing whitespace
mm: compaction: introduce isolate_migratepages_range()
mm: compaction: introduce map_pages()
mm: compaction: introduce isolate_freepages_range()
mm: compaction: export some of the functions
mm: page_alloc: introduce alloc_contig_range()
mm: page_alloc: change fallbacks array handling
mm: mmzone: MIGRATE_CMA migration type added
mm: page_isolation: MIGRATE_CMA isolation functions added
Minchan Kim (1):
cma: fix migration mode
Vitaly Andrianov (1):
ARM: dma-mapping: use PMD size for section unmap
Documentation/kernel-parameters.txt | 9 +
arch/Kconfig | 3 +
arch/arm/Kconfig | 11 +
arch/arm/common/dmabounce.c | 84 ++-
arch/arm/include/asm/device.h | 4 +
arch/arm/include/asm/dma-contiguous.h | 15 +
arch/arm/include/asm/dma-iommu.h | 34 +
arch/arm/include/asm/dma-mapping.h | 407 +++--------
arch/arm/include/asm/mach/map.h | 1 +
arch/arm/kernel/setup.c | 9 +-
arch/arm/mm/dma-mapping.c | 1348 ++++++++++++++++++++++++++++-----
arch/arm/mm/init.c | 23 +-
arch/arm/mm/mm.h | 3 +
arch/arm/mm/mmu.c | 31 +-
arch/arm/mm/vmregion.h | 2 +-
arch/x86/Kconfig | 1 +
arch/x86/include/asm/dma-contiguous.h | 13 +
arch/x86/include/asm/dma-mapping.h | 5 +
arch/x86/kernel/pci-dma.c | 18 +-
arch/x86/kernel/pci-nommu.c | 8 +-
arch/x86/kernel/setup.c | 2 +
drivers/base/Kconfig | 89 +++
drivers/base/Makefile | 1 +
drivers/base/dma-coherent.c | 42 +
drivers/base/dma-contiguous.c | 401 ++++++++++
include/asm-generic/dma-coherent.h | 4 +-
include/asm-generic/dma-contiguous.h | 28 +
include/linux/device.h | 4 +
include/linux/dma-contiguous.h | 110 +++
include/linux/gfp.h | 12 +
include/linux/mmzone.h | 47 +-
include/linux/page-isolation.h | 18 +-
mm/Kconfig | 2 +-
mm/Makefile | 3 +-
mm/compaction.c | 418 +++++++----
mm/internal.h | 33 +
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 6 +-
mm/page_alloc.c | 409 +++++++++--
mm/page_isolation.c | 15 +-
mm/vmstat.c | 3 +
41 files changed, 2898 insertions(+), 780 deletions(-)
create mode 100644 arch/arm/include/asm/dma-contiguous.h
create mode 100644 arch/arm/include/asm/dma-iommu.h
create mode 100644 arch/x86/include/asm/dma-contiguous.h
create mode 100644 drivers/base/dma-contiguous.c
create mode 100644 include/asm-generic/dma-contiguous.h
create mode 100644 include/linux/dma-contiguous.h
Hello!
Recent changes to ioremap and unification of vmalloc regions on ARM
significantly reduces the possible size of the consistent dma region.
They are significantly limited allowed dma coherent/writecombine
allocations.
This experimental patchset replaces custom consistent dma regions usage
in dma-mapping framework in favour of generic vmalloc areas created on
demand for each coherent and writecombine allocations. The main purpose
for this patchset is to remove 2MiB limit of dma coherent/writecombine
allocations.
Atomic allocations are served from special pool preallocated on boot,
becasue vmalloc areas cannot be reliably created in atomic context.
This patch is based on vanilla v3.4-rc7 release.
Atomic allocations have been tested with s3c-sdhci driver on Samsung
UniversalC210 board with dmabounce code enabled to force
dma_alloc_coherent() use on each dma_map_* call (some of them are made
from interrupts).
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Changelog:
v2:
- added support for atomic allocations (served from preallocated pool)
- minor cleanup here and there
- rebased onto v3.4-rc7
v1: http://thread.gmane.org/gmane.linux.kernel.mm/76703
- initial version
Patch summary:
Marek Szyprowski (4):
mm: vmalloc: use const void * for caller argument
mm: vmalloc: export find_vm_area() function
mm: vmalloc: add VM_DMA flag to indicate areas used by dma-mapping
framework
ARM: dma-mapping: remove custom consistent dma region
Documentation/kernel-parameters.txt | 4 +
arch/arm/include/asm/dma-mapping.h | 2 +-
arch/arm/mm/dma-mapping.c | 360 ++++++++++++++++-------------------
include/linux/vmalloc.h | 10 +-
mm/vmalloc.c | 31 ++--
5 files changed, 185 insertions(+), 196 deletions(-)
--
1.7.10.1
Hi Linus,
Here's the first signed-tag pull request for dma-buf framework.
Could you please pull the dma-buf updates for 3.5? This includes the
following key items:
- mmap support
- vmap support
- related documentation updates
These are needed by various drivers to allow mmap/vmap of dma-buf
shared buffers. Dave Airlie has some prime patches dependent on the
vmap pull as well.
Thanks and best regards,
~Sumit.
The following changes since commit 76e10d158efb6d4516018846f60c2ab5501900bc:
Linux 3.4 (2012-05-20 15:29:13 -0700)
are available in the git repository at:
ssh://sumitsemwal@git.linaro.org/~/public_git/linux-dma-buf.git
tags/tag-for-linus-3.5
for you to fetch changes up to b25b086d23eb852bf3cfdeb60409b4967ebb3c0c:
dma-buf: add initial vmap documentation (2012-05-25 12:51:11 +0530)
----------------------------------------------------------------
dma-buf updates for 3.5
----------------------------------------------------------------
Daniel Vetter (1):
dma-buf: mmap support
Dave Airlie (2):
dma-buf: add vmap interface
dma-buf: add initial vmap documentation
Sumit Semwal (1):
dma-buf: minor documentation fixes.
Documentation/dma-buf-sharing.txt | 109 ++++++++++++++++++++++++++++++++++---
drivers/base/dma-buf.c | 99 ++++++++++++++++++++++++++++++++-
include/linux/dma-buf.h | 33 +++++++++++
3 files changed, 233 insertions(+), 8 deletions(-)
Hi All,
I realise it's been a while since this was last discussed, however I'd like
to bring up kernel-side synchronization again. By kernel-side
synchronization, I mean allowing multiple drivers/devices wanting to access
the same buffer to do so without bouncing up to userspace to resolve
dependencies such as "the display controller can't start scanning out a
buffer until the GPU has finished rendering into it". As such, this is
really just an optimization which reduces latency between E.g. The GPU
finishing a rendering job and that buffer being scanned out. I appreciate
this particular example is already solved on desktop graphics cards as the
display controller and 3D core are both controlled by the same driver, so no
"generic" mechanism is needed. However on ARM SoCs, the 3D core (like an ARM
Mali) and display controller tend to be driven by separate drivers, so some
mechanism is needed to allow both drivers to synchronize their access to
buffers.
There are multiple ways synchronization can be achieved, fences/sync objects
is one common approach, however we're presenting a different approach.
Personally, I quite like fence sync objects, however we believe it requires
a lot of userspace interfaces to be changed to pass around sync object
handles. Our hope is that the kds approach will require less effort to make
use of as no existing userspace interfaces need to be changed. E.g. To use
explicit fences, the struct drm_mode_crtc_page_flip would need a new members
to pass in the handle(s) of sync object(s) which the flip depends on (I.e.
don't flip until these fences fire). The additional benefit of our approach
is that it prevents userspace specifying dependency loops which can cause a
deadlock (see kds.txt for an explanation of what I mean here).
I have waited until now to bring this up again because I am now able to
share the code I was trying (and failing I think) to explain previously. The
code has now been released under the GPLv2 from ARM Mali's developer portal,
however I've attempted to turn that into a patch to allow it to be discussed
on this list. Please find the patch inline below.
While KDS defines a very generic mechanism, I am proposing that this code or
at least the concepts be merged with the existing dma_buf code, so a the
struct kds_resource members get moved to struct dma_buf, kds_* functions get
renamed to dma_buf_* functions, etc. So I guess what I'm saying is please
don't review the actual code just yet, only the concepts the code describes,
where kds_resource == dma_duf.
Cheers,
Tom
Author: Tom Cooksey <tom.cooksey(a)arm.com>
Date: Fri May 25 10:45:27 2012 +0100
Add new system to allow synchronizing access to resources
See Documentation/kds.txt for details, however the general
idea is that this kds framework synchronizes multiple drivers
("clients") wanting to access the same resources, where a
resource is typically a 2D image buffer being shared around
using dma-buf.
Note: This patch is created by extracting the sources from the
tarball on <http://www.malideveloper.com/open-source-mali-gpus-lin
ux-kernel-device-drivers---dev-releases.php> and putting them in
roughly the right places.
diff --git a/Documentation/kds.txt b/Documentation/kds.txt
new file mode 100644
index 0000000..a96db21
--- /dev/null
+++ b/Documentation/kds.txt
@@ -0,0 +1,113 @@
+#
+# (C) COPYRIGHT 2012 ARM Limited. All rights reserved.
+#
+# This program is free software and is provided to you under the terms of
the GNU General Public License version 2
+# as published by the Free Software Foundation, and any use by you of this
program is subject to the terms of such GNU licence.
+#
+# A copy of the licence is included with the program, and can also be
obtained from Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.
+#
+#
+
+
+==============================
+kds - Kernel Dependency System
+==============================
+
+Introduction
+------------
+kds provides a mechanism for clients to atomically lock down multiple
abstract resources.
+This can be done either synchronously or asynchronously.
+Abstract resources is used to allow a set of clients to use kds to control
access to any
+resource, an example is structured memory buffers.
+
+kds supports that buffer is locked for exclusive access and sharing of
buffers.
+
+kds can be built as either a integrated feature of the kernel or as a
module.
+It supports being compiled as a module both in-tree and out-of-tree.
+
+
+Concepts
+--------
+A core concept in kds is abstract resources.
+A kds resource is just an abstraction for some client object, kds doesn't
care what it is.
+Typically EGL will consider UMP buffers as being a resource, thus each UMP
buffer has
+a kds resource for synchronization to the buffer.
+
+kds allows a client to create and destroy the abstract resource objects.
+A new resource object is made available asap (it is just a simple malloc
with some initializations),
+while destroy it requires some external synchronization.
+
+The other core concept in kds is consumer of resources.
+kds is requested to allow a client to consume a set of resources and the
client will be notified when it can consume the resources.
+
+Exclusive access allows only one client to consume a resource.
+Shared access permits multiple consumers to acceess a resource
concurrently.
+
+
+APIs
+----
+kds provides simple resource allocate and destroy functions.
+Clients use this to instantiate and control the lifetime of the resources
kds manages.
+
+kds provides two ways to wait for resources:
+- Asynchronous wait: the client specifies a function pointer to be called
when wait is over
+- Synchronous wait: Function blocks until access is gained.
+
+The synchronous API has a timeout for the wait.
+The call can early out if a signal is delivered.
+
+After a client is done consuming the resource kds must be notified to
release the resources and let some other client take ownership.
+This is done via resource set release call.
+
+A Windows comparison:
+kds implements WaitForMultipleObjectsEx(..., bWaitAll = TRUE, ...) but also
has an asynchronous version in addition.
+kds resources can be seen as being the same as NT object manager resources.
+
+Internals
+---------
+kds guarantees atomicity when a set of resources is operated on.
+This is implemented via a global resource lock which is taken by kds when
it updates resource objects.
+
+Internally a resource in kds is a linked list head with some flags.
+
+When a consumer requests access to a set of resources it is queued on each
of the resources.
+The link from the consumer to the resources can be triggered. Once all
links are triggered
+the registered callback is called or the blocking function returns.
+A link is considered triggered if it is the first on the list of consumers
of a resource,
+or if all the links ahead of it is marked as shared and itself is of the
type shared.
+
+When the client is done consuming the consumer object is removed from the
linked lists of
+the resources and a potential new consumer becomes the head of the
resources.
+As we add and remove consumers atomically across all resources we can
guarantee that
+we never introduces a A->B + B->A type of loops/deadlocks.
+
+
+kbase/base implementation
+-------------------------
+A HW job needs access to a set of shared resources.
+EGL tracks this and encodes the set along with the atom in the ringbuffer.
+EGL allocates a (k)base dep object to represent the dependency to the set
of resources and encodes that along with the list of resources.
+This dep object is use to create a dependency from a job chain(atom) to the
resources it needs to run.
+When kbase decodes the atom in the ringbuffer it finds the set of resources
and calls kds to request all the needed resources.
+As EGL needs to know when the kds request is delivered a new base event
object is needed: atom enqueued. This event is only delivered for atoms
which uses kds.
+The callback kbase registers trigger the dependency object described which
would trigger the existing JD system to release the job chain.
+When the atom is done kds resource set release is call to release the
resources.
+
+EGL will typically use exclusive access to the render target, while all
buffers used as input can be marked as shared.
+
+
+Buffer publish/vsync
+--------------------
+EGL will use a separate ioctl or DRM flip to request the flip.
+If the LCD driver is integrated with kds EGL can do these operations early.
+The LCD driver must then implement the ioctl or DRM flip to be asynchronous
with kds async call.
+The LCD driver binds a kds resource to each virtual buffer (2 buffers in
case of double-buffering).
+EGL will make a dependency to the target kds resource in the kbase atom.
+After EGL receives a atom enqueued event it can ask the LCD driver to pan
to the target kds resource.
+When the atom is completed it'll release the resource and the LCD driver
will get its callback.
+In the callback it'll load the target buffer into the DMA unit of the LCD
hardware.
+The LCD driver will be the consumer of both buffers for a short period.
+The LCD driver will call kds resource set release on the previous on-screen
buffer when the next vsync/dma read end is handled.
+
+
diff --git a/drivers/misc/kds.c b/drivers/misc/kds.c
new file mode 100644
index 0000000..8d7d55e
--- /dev/null
+++ b/drivers/misc/kds.c
@@ -0,0 +1,461 @@
+/*
+ *
+ * (C) COPYRIGHT 2012 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of
the GNU General Public License version 2
+ * as published by the Free Software Foundation, and any use by you of this
program is subject to the terms of such GNU licence.
+ *
+ * A copy of the licence is included with the program, and can also be
obtained from Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.
+ *
+ */
+
+
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/wait.h>
+#include <linux/sched.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/workqueue.h>
+#include <linux/kds.h>
+
+
+#define KDS_LINK_TRIGGERED (1u << 0)
+#define KDS_LINK_EXCLUSIVE (1u << 1)
+
+#define KDS_IGNORED NULL
+#define KDS_INVALID (void*)-2
+#define KDS_RESOURCE (void*)-1
+
+struct kds_resource_set
+{
+ unsigned long num_resources;
+ unsigned long pending;
+ unsigned long locked_resources;
+ struct kds_callback * cb;
+ void * callback_parameter;
+ void * callback_extra_parameter;
+ struct list_head callback_link;
+ struct work_struct callback_work;
+ struct kds_link resources[0];
+};
+
+static DEFINE_MUTEX(kds_lock);
+
+int kds_callback_init(struct kds_callback * cb, int direct, kds_callback_fn
user_cb)
+{
+ int ret = 0;
+
+ cb->direct = direct;
+ cb->user_cb = user_cb;
+
+ if (!direct)
+ {
+ cb->wq = alloc_workqueue("kds", WQ_UNBOUND | WQ_HIGHPRI,
WQ_UNBOUND_MAX_ACTIVE);
+ if (!cb->wq)
+ ret = -ENOMEM;
+ }
+ else
+ {
+ cb->wq = NULL;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL(kds_callback_init);
+
+void kds_callback_term(struct kds_callback * cb)
+{
+ if (!cb->direct)
+ {
+ BUG_ON(!cb->wq);
+ destroy_workqueue(cb->wq);
+ }
+ else
+ {
+ BUG_ON(cb->wq);
+ }
+}
+
+EXPORT_SYMBOL(kds_callback_term);
+
+static void kds_do_user_callback(struct kds_resource_set * rset)
+{
+ rset->cb->user_cb(rset->callback_parameter,
rset->callback_extra_parameter);
+}
+
+static void kds_queued_callback(struct work_struct * work)
+{
+ struct kds_resource_set * rset;
+ rset = container_of( work, struct kds_resource_set, callback_work);
+
+ kds_do_user_callback(rset);
+}
+
+static void kds_callback_perform(struct kds_resource_set * rset)
+{
+ if (rset->cb->direct)
+ kds_do_user_callback(rset);
+ else
+ {
+ int result;
+ result = queue_work(rset->cb->wq, &rset->callback_work);
+ /* if we got a 0 return it means we've triggered the same
rset twice! */
+ BUG_ON(!result);
+ }
+}
+
+void kds_resource_init(struct kds_resource * res)
+{
+ BUG_ON(!res);
+ INIT_LIST_HEAD(&res->waiters.link);
+ res->waiters.parent = KDS_RESOURCE;
+}
+EXPORT_SYMBOL(kds_resource_init);
+
+void kds_resource_term(struct kds_resource * res)
+{
+ BUG_ON(!res);
+ BUG_ON(!list_empty(&res->waiters.link));
+ res->waiters.parent = KDS_INVALID;
+}
+EXPORT_SYMBOL(kds_resource_term);
+
+int kds_async_waitall(
+ struct kds_resource_set ** pprset,
+ unsigned long flags,
+ struct kds_callback * cb,
+ void * callback_parameter,
+ void * callback_extra_parameter,
+ int number_resources,
+ unsigned long * exclusive_access_bitmap,
+ struct kds_resource ** resource_list)
+{
+ struct kds_resource_set * rset = NULL;
+ int i;
+ int triggered;
+ int err = -EFAULT;
+
+ BUG_ON(!pprset);
+ BUG_ON(!resource_list);
+ BUG_ON(!cb);
+
+ mutex_lock(&kds_lock);
+
+ if ((flags & KDS_FLAG_LOCKED_ACTION) == KDS_FLAG_LOCKED_FAIL)
+ {
+ for (i = 0; i < number_resources; i++)
+ {
+ if (resource_list[i]->lock_count)
+ {
+ err = -EBUSY;
+ goto errout;
+ }
+ }
+ }
+
+ rset = kmalloc(sizeof(*rset) + number_resources * sizeof(struct
kds_link), GFP_KERNEL);
+ if (!rset)
+ {
+ err = -ENOMEM;
+ goto errout;
+ }
+
+ rset->num_resources = number_resources;
+ rset->pending = number_resources;
+ rset->locked_resources = 0;
+ rset->cb = cb;
+ rset->callback_parameter = callback_parameter;
+ rset->callback_extra_parameter = callback_extra_parameter;
+ INIT_LIST_HEAD(&rset->callback_link);
+ INIT_WORK(&rset->callback_work, kds_queued_callback);
+
+ for (i = 0; i < number_resources; i++)
+ {
+ unsigned long link_state = 0;
+
+ INIT_LIST_HEAD(&rset->resources[i].link);
+ rset->resources[i].parent = rset;
+
+ if (test_bit(i, exclusive_access_bitmap))
+ {
+ link_state |= KDS_LINK_EXCLUSIVE;
+ }
+
+ /* no-one else waiting? */
+ if (list_empty(&resource_list[i]->waiters.link))
+ {
+ link_state |= KDS_LINK_TRIGGERED;
+ rset->pending--;
+ }
+ /* Adding a non-exclusive and the current tail is a
triggered non-exclusive? */
+ else if (((link_state & KDS_LINK_EXCLUSIVE) == 0) &&
+ (((list_entry(resource_list[i]->waiters.link.prev,
struct kds_link, link)->state & (KDS_LINK_EXCLUSIVE | KDS_LINK_TRIGGERED))
== KDS_LINK_TRIGGERED)))
+ {
+ link_state |= KDS_LINK_TRIGGERED;
+ rset->pending--;
+ }
+ /* locked & ignore locked? */
+ else if ((resource_list[i]->lock_count) && ((flags &
KDS_FLAG_LOCKED_ACTION) == KDS_FLAG_LOCKED_IGNORE) )
+ {
+ link_state |= KDS_LINK_TRIGGERED;
+ rset->pending--;
+ rset->resources[i].parent = KDS_IGNORED; /* to
disable decrementing the pending count when we get the ignored resource */
+ }
+ rset->resources[i].state = link_state;
+ list_add_tail(&rset->resources[i].link,
&resource_list[i]->waiters.link);
+ }
+
+ triggered = (rset->pending == 0);
+
+ mutex_unlock(&kds_lock);
+
+ /* set the pointer before the callback is called so it sees it */
+ *pprset = rset;
+
+ if (triggered)
+ {
+ /* all resources obtained, trigger callback */
+ kds_callback_perform(rset);
+ }
+
+ return 0;
+
+errout:
+ mutex_unlock(&kds_lock);
+ return err;
+}
+EXPORT_SYMBOL(kds_async_waitall);
+
+static void wake_up_sync_call(void * callback_parameter, void *
callback_extra_parameter)
+{
+ wait_queue_head_t * wait = (wait_queue_head_t*)callback_parameter;
+ wake_up(wait);
+}
+
+static struct kds_callback sync_cb =
+{
+ wake_up_sync_call,
+ 1,
+ NULL,
+};
+
+struct kds_resource_set * kds_waitall(
+ int number_resources,
+ unsigned long * exclusive_access_bitmap,
+ struct kds_resource ** resource_list,
+ unsigned long jiffies_timeout)
+{
+ struct kds_resource_set * rset;
+ int i;
+ int triggered = 0;
+ DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake);
+
+ rset = kmalloc(sizeof(*rset) + number_resources * sizeof(struct
kds_link), GFP_KERNEL);
+ if (!rset)
+ return rset;
+
+ rset->num_resources = number_resources;
+ rset->pending = number_resources;
+ rset->locked_resources = 1;
+ INIT_LIST_HEAD(&rset->callback_link);
+ INIT_WORK(&rset->callback_work, kds_queued_callback);
+
+ mutex_lock(&kds_lock);
+
+ for (i = 0; i < number_resources; i++)
+ {
+ unsigned long link_state = 0;
+
+ if (likely(resource_list[i]->lock_count < ULONG_MAX))
+ resource_list[i]->lock_count++;
+ else
+ break;
+
+ if (test_bit(i, exclusive_access_bitmap))
+ {
+ link_state |= KDS_LINK_EXCLUSIVE;
+ }
+
+ if (list_empty(&resource_list[i]->waiters.link))
+ {
+ link_state |= KDS_LINK_TRIGGERED;
+ rset->pending--;
+ }
+ /* Adding a non-exclusive and the current tail is a
triggered non-exclusive? */
+ else if (((link_state & KDS_LINK_EXCLUSIVE) == 0) &&
+ (((list_entry(resource_list[i]->waiters.link.prev,
struct kds_link, link)->state & (KDS_LINK_EXCLUSIVE | KDS_LINK_TRIGGERED))
== KDS_LINK_TRIGGERED)))
+ {
+ link_state |= KDS_LINK_TRIGGERED;
+ rset->pending--;
+ }
+
+ INIT_LIST_HEAD(&rset->resources[i].link);
+ rset->resources[i].parent = rset;
+ rset->resources[i].state = link_state;
+ list_add_tail(&rset->resources[i].link,
&resource_list[i]->waiters.link);
+ }
+
+ if (i < number_resources)
+ {
+ /* an overflow was detected, roll back */
+ while (i--)
+ {
+ list_del(&rset->resources[i].link);
+ resource_list[i]->lock_count--;
+ }
+ mutex_unlock(&kds_lock);
+ kfree(rset);
+ return ERR_PTR(-EFAULT);
+ }
+
+ if (rset->pending == 0)
+ triggered = 1;
+ else
+ {
+ rset->cb = &sync_cb;
+ rset->callback_parameter = &wake;
+ rset->callback_extra_parameter = NULL;
+ }
+
+ mutex_unlock(&kds_lock);
+
+ if (!triggered)
+ {
+ long wait_res;
+ if ( KDS_WAIT_BLOCKING == jiffies_timeout )
+ {
+ wait_res = wait_event_interruptible(wake,
rset->pending == 0);
+ }
+ else
+ {
+ wait_res = wait_event_interruptible_timeout(wake,
rset->pending == 0, jiffies_timeout);
+ }
+ if ((wait_res == -ERESTARTSYS) || (wait_res == 0))
+ {
+ /* use \a kds_resource_set_release to roll back */
+ kds_resource_set_release(&rset);
+ return ERR_PTR(wait_res);
+ }
+ }
+ return rset;
+}
+EXPORT_SYMBOL(kds_waitall);
+
+void kds_resource_set_release(struct kds_resource_set ** pprset)
+{
+ struct list_head triggered = LIST_HEAD_INIT(triggered);
+ struct kds_resource_set * rset;
+ struct kds_resource_set * it;
+ int i;
+
+ BUG_ON(!pprset);
+
+ mutex_lock(&kds_lock);
+
+ rset = *pprset;
+ if (!rset)
+ {
+ /* caught a race between a cancelation
+ * and a completion, nothing to do */
+ mutex_unlock(&kds_lock);
+ return;
+ }
+
+ /* clear user pointer so we'll be the only
+ * thread handling the release */
+ *pprset = NULL;
+
+ for (i = 0; i < rset->num_resources; i++)
+ {
+ struct kds_resource * resource;
+ struct kds_link * it = NULL;
+
+ /* fetch the previous entry on the linked list */
+ it = list_entry(rset->resources[i].link.prev, struct
kds_link, link);
+ /* unlink ourself */
+ list_del(&rset->resources[i].link);
+
+ /* any waiters? */
+ if (list_empty(&it->link))
+ continue;
+
+ /* were we the head of the list? (head if prev is a
resource) */
+ if (it->parent != KDS_RESOURCE)
+ continue;
+
+ /* we were the head, find the kds_resource */
+ resource = container_of(it, struct kds_resource, waiters);
+
+ if (rset->locked_resources)
+ {
+ resource->lock_count--;
+ }
+
+ /* we know there is someone waiting from the any-waiters
test above */
+
+ /* find the head of the waiting list */
+ it = list_first_entry(&resource->waiters.link, struct
kds_link, link);
+
+ /* new exclusive owner? */
+ if (it->state & KDS_LINK_EXCLUSIVE)
+ {
+ /* link now triggered */
+ it->state |= KDS_LINK_TRIGGERED;
+ /* a parent to update? */
+ if (it->parent != KDS_IGNORED)
+ {
+ if (0 == --it->parent->pending)
+ {
+ /* new owner now triggered, track
for callback later */
+ list_add(&it->parent->callback_link,
&triggered);
+ }
+ }
+ }
+ /* exclusive releasing ? */
+ else if (rset->resources[i].state & KDS_LINK_EXCLUSIVE)
+ {
+ /* trigger non-exclusive until end-of-list or first
exclusive */
+ list_for_each_entry(it, &resource->waiters.link,
link)
+ {
+ /* exclusive found, stop triggering */
+ if (it->state & KDS_LINK_EXCLUSIVE)
+ break;
+
+ it->state |= KDS_LINK_TRIGGERED;
+ /* a parent to update? */
+ if (it->parent != KDS_IGNORED)
+ {
+ if (0 == --it->parent->pending)
+ {
+ /* new owner now triggered,
track for callback later */
+
list_add(&it->parent->callback_link, &triggered);
+ }
+ }
+ }
+ }
+
+ }
+
+ mutex_unlock(&kds_lock);
+
+ while (!list_empty(&triggered))
+ {
+ it = list_first_entry(&triggered, struct kds_resource_set,
callback_link);
+ list_del(&it->callback_link);
+ kds_callback_perform(it);
+ }
+
+ cancel_work_sync(&rset->callback_work);
+
+ /* free the resource set */
+ kfree(rset);
+}
+EXPORT_SYMBOL(kds_resource_set_release);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("ARM Ltd.");
+MODULE_VERSION("1.0");
diff --git a/include/linux/kds.h b/include/linux/kds.h
new file mode 100644
index 0000000..65e5706
--- /dev/null
+++ b/include/linux/kds.h
@@ -0,0 +1,154 @@
+/*
+ *
+ * (C) COPYRIGHT 2012 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of
the GNU General Public License version 2
+ * as published by the Free Software Foundation, and any use by you of this
program is subject to the terms of such GNU licence.
+ *
+ * A copy of the licence is included with the program, and can also be
obtained from Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.
+ *
+ */
+
+
+
+#ifndef _KDS_H_
+#define _KDS_H_
+
+#include <linux/list.h>
+#include <linux/workqueue.h>
+
+#define KDS_WAIT_BLOCKING (ULONG_MAX)
+
+/* what to do when waitall must wait for a synchronous locked resource: */
+#define KDS_FLAG_LOCKED_FAIL (0u << 0) /* fail waitall */
+#define KDS_FLAG_LOCKED_IGNORE (1u << 0) /* don't wait, but block other
that waits */
+#define KDS_FLAG_LOCKED_WAIT (2u << 0) /* wait (normal */
+#define KDS_FLAG_LOCKED_ACTION (3u << 0) /* mask to extract the action to
do on locked resources */
+
+struct kds_resource_set;
+
+typedef void (*kds_callback_fn) (void * callback_parameter, void *
callback_extra_parameter);
+
+struct kds_callback
+{
+ kds_callback_fn user_cb; /* real cb */
+ int direct; /* do direct or queued call? */
+ struct workqueue_struct * wq;
+};
+
+struct kds_link
+{
+ struct kds_resource_set * parent;
+ struct list_head link;
+ unsigned long state;
+};
+
+struct kds_resource
+{
+ struct kds_link waiters;
+ unsigned long lock_count;
+};
+
+/* callback API */
+
+/* Initialize a callback object.
+ *
+ * Typically created per context or per hw resource.
+ *
+ * Callbacks can be performed directly if no nested locking can
+ * happen in the client.
+ *
+ * Nested locking can occur when a lock is held during the
kds_async_waitall or
+ * kds_resource_set_release call. If the callback needs to take the same
lock
+ * nested locking will happen.
+ *
+ * If nested locking could happen non-direct callbacks can be requested.
+ * Callbacks will then be called asynchronous to the triggering call.
+ */
+int kds_callback_init(struct kds_callback * cb, int direct, kds_callback_fn
user_cb);
+
+/* Terminate the use of a callback object.
+ *
+ * If the callback object was set up as non-direct
+ * any pending callbacks will be flushed first.
+ * Note that to avoid a deadlock the lock callbacks needs
+ * can't be held when a callback object is terminated.
+ */
+void kds_callback_term(struct kds_callback * cb);
+
+
+/* resource object API */
+
+/* initialize a resource handle for a shared resource */
+void kds_resource_init(struct kds_resource * resource);
+
+/*
+ * Will assert if the resource is being used or waited on.
+ * The caller should NOT try to terminate a resource that could still have
clients.
+ * After the function returns the resource is no longer known by kds.
+ */
+void kds_resource_term(struct kds_resource * resource);
+
+/* Asynchronous wait for a set of resources.
+ * Callback will be called when all resources are available.
+ * If all the resources was available the callback will be called before
kds_async_waitall returns.
+ * So one must not hold any locks the callback code-flow can take when
calling kds_async_waitall.
+ * Caller considered to own/use the resources until \a kds_rset_release is
called.
+ * flags is one or more of the KDS_FLAG_* set.
+ * exclusive_access_bitmap is a bitmap where a high bit means exclusive
access while a low bit means shared access.
+ * Use the Linux __set_bit API, where the index of the buffer to control is
used as the bit index.
+ *
+ * Standard Linux error return value.
+ */
+int kds_async_waitall(
+ struct kds_resource_set ** pprset,
+ unsigned long flags,
+ struct kds_callback * cb,
+ void * callback_parameter,
+ void * callback_extra_parameter,
+ int number_resources,
+ unsigned long * exclusive_access_bitmap,
+ struct kds_resource ** resource_list);
+
+/* Synchronous wait for a set of resources.
+ * Function will return when one of these have happened:
+ * - all resources have been obtained
+ * - timeout lapsed while waiting
+ * - a signal was received while waiting
+ *
+ * Caller considered to own/use the resources when the function returns.
+ * Caller must release the resources using \a kds_rset_release.
+ *
+ * Calling this function while holding already locked resources or other
locking primitives is dangerous.
+ * One must if this is needed decide on a lock order of the resources
and/or the other locking primitives
+ * and always take the resources/locking primitives in the specific order.
+ *
+ * Use the ERR_PTR framework to decode the return value.
+ * NULL = time out
+ * If IS_ERR then PTR_ERR gives:
+ * ERESTARTSYS = signal received, retry call after signal
+ * all other values = internal error, lock failed
+ * Other values = successful wait, now the owner, must call
kds_resource_set_release
+ */
+struct kds_resource_set * kds_waitall(
+ int number_resources,
+ unsigned long * exclusive_access_bitmap,
+ struct kds_resource ** resource_list,
+ unsigned long jifies_timeout);
+
+/* Release resources after use.
+ * Caller must handle that other async callbacks will trigger,
+ * so must avoid holding any locks a callback will take.
+ *
+ * The function takes a pointer to your poiner to handle a race
+ * between a cancelation and a completion.
+ *
+ * If the caller can't guarantee that a race can't occur then
+ * the passed in pointer must be the same in both call paths
+ * to allow kds to manage the potential race.
+ */
+void kds_resource_set_release(struct kds_resource_set ** pprset);
+
+#endif /* _KDS_H_ */
+
>
> For the last few months we (ARM MPD... "The Mali guys") have been working on
> getting X.Org up and running with Mali T6xx (ARM's next-generation GPU IP).
> The approach is very similar (well identical I think) to how things work on
> OMAP: We use a DRM driver to manage the display controller via KMS. The KMS
> driver also allocates both scan-out and pixmap/back buffers via the
> DRM_IOCTL_MODE_CREATE_DUMB ioctl which is internally implemented with GEM.
> When returning buffers to DRI clients, the x-server uses flink to get a
> global handle to a buffer which it passes back to the DRI client (in our
> case the Mali-T600 X11 EGL winsys). The client then uses the new PRIME
> ioctls to export the GEM buffer it received from the x-server to a dma_buf
> fd. This fd is then passed into the T6xx kernel driver via our own job
> dispatch user/kernel API (we're not using DRM for driving the GPU, only the
> display controller).
So using dumb in this was is probably a bit of an abuse, since dumb is defined
to provide buffers not to be used for acceleration hw. Since when we allocate
dumb buffers, we can't know what special hw layouts are required (tiling etc)
for optimal performance for accel. The logic to work that out is rarely generic.
>
> http://git.linaro.org/gitweb?p=arm/xorg/driver/xf86-video-armsoc.git;a=summa
> ry
>
> Note: When we originally spoke to Rob Clark about this, he suggested we take
> the already-generic xf86-video-modesetting and just add the dri2 code to it.
> This is indeed how we started out, however as we progressed it became clear
> that the majority of the code we wanted was in the omap driver and were
> having to work fairly hard to keep some of the original modesetting code.
> This is why we've now changed tactic and just forked the OMAP driver,
> something Rob is more than happy for us to do.
It does seem like porting to -modesetting, and maybe cleaning up modesetting
if its needs it. The modesetting driver is pretty much just a make it
work port of
the radeon/nouveau/intel code "shared" code.
> One thing the DDX driver isn't doing yet is making use of 2D hw blocks. In
> the short-term, we will simply create a branch off of the "generic" master
> for each SoC and add 2D hardware support there. We do however want a more
> permanent solution which doesn't need a separate branch per SoC. Some of the
> suggested solutions are:
>
> * Add a new generic DRM ioctl API for larger 2D operations (I would imagine
> small blits/blends would be done in SW).
Not going to happen, again the hw isn't generic in this area, some hw requires
3D engines to do 2D ops etc. The limitations on some hw with overlaps etc,
and finally it breaks the rule about generic ioctls for acceleration operations.
> * Use SW rendering for everything other than solid blits and use v4l2's
> blitting API for those (importing/exporting buffers to be blitted using
> dma_buf). The theory here is that most UIs are rendered with GLES and so you
> only need 2D hardware for blits. I think we'll prototype this approach on
> Exynos.
Seems a bit over the top,
> * Define a new x-server sub-module interface to allow a seperate .so 2D
> driver to be loaded (this is the approach the current OMAP DDX uses).
This seems the sanest.
I haven't time this week to review the code, but I'll try and take a look when
time permits.
Dave.
Some minor inline documentation fixes for gaps resulting from new patches.
Signed-off-by: Sumit Semwal <sumit.semwal(a)ti.com>
---
drivers/base/dma-buf.c | 9 +++++----
include/linux/dma-buf.h | 3 +++
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index c3c88b0..24e88fe 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -429,7 +429,7 @@ EXPORT_SYMBOL_GPL(dma_buf_kunmap);
/**
* dma_buf_mmap - Setup up a userspace mmap with the given vma
- * @dma_buf: [in] buffer that should back the vma
+ * @dmabuf: [in] buffer that should back the vma
* @vma: [in] vma for the mmap
* @pgoff: [in] offset in pages where this mmap should start within the
* dma-buf buffer.
@@ -470,8 +470,9 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
EXPORT_SYMBOL_GPL(dma_buf_mmap);
/**
- * dma_buf_vmap - Create virtual mapping for the buffer object into kernel address space. Same restrictions as for vmap and friends apply.
- * @dma_buf: [in] buffer to vmap
+ * dma_buf_vmap - Create virtual mapping for the buffer object into kernel
+ * address space. Same restrictions as for vmap and friends apply.
+ * @dmabuf: [in] buffer to vmap
*
* This call may fail due to lack of virtual mapping address space.
* These calls are optional in drivers. The intended use for them
@@ -491,7 +492,7 @@ EXPORT_SYMBOL_GPL(dma_buf_vmap);
/**
* dma_buf_vunmap - Unmap a vmap obtained by dma_buf_vmap.
- * @dma_buf: [in] buffer to vmap
+ * @dmabuf: [in] buffer to vunmap
*/
void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
{
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index a02b1ff..eb48f38 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -65,6 +65,9 @@ struct dma_buf_attachment;
* mapping needs to be coherent - if the exporter doesn't directly
* support this, it needs to fake coherency by shooting down any ptes
* when transitioning away from the cpu domain.
+ * @vmap: [optional] creates a virtual mapping for the buffer into kernel
+ * address space. Same restrictions as for vmap and friends apply.
+ * @vunmap: [optional] unmaps a vmap from the buffer
*/
struct dma_buf_ops {
int (*attach)(struct dma_buf *, struct device *,
--
1.7.9.5