Hello Everyone,
A very happy new year 2012! :)
This patchset is an RFC for the way videobuf2 can be adapted to add support for
DMA buffer sharing framework[1].
The original patch-set for the idea, and PoC of buffer sharing was by
Tomasz Stanislawski <t.stanislaws(a)samsung.com>, who demonstrated buffer sharing
between two v4l2 devices[2]. This RFC is needed to adapt these patches to the
changes that have happened in the DMA buffer sharing framework over past few
months.
To begin with, I have tried to adapt only the dma-contig allocator, and only as
a user of dma-buf buffer. I am currently working on the v4l2-as-an-exporter
changes, and will share as soon as I get it in some shape.
As with the PoC [2], the handle for sharing buffers is a file-descriptor (fd).
The usage documentation is also a part of [1].
So, the current RFC has the following limitations:
- Only buffer sharing as a buffer user,
- doesn't handle cases where even for a contiguous buffer, the sg_table can have
more than one scatterlist entry.
Thanks and best regards,
~Sumit.
[1]: dma-buf patchset at: https://lkml.org/lkml/2011/12/26/29
[2]: http://lwn.net/Articles/454389
Sumit Semwal (4):
v4l: Add DMABUF as a memory type
v4l:vb2: add support for shared buffer (dma_buf)
v4l:vb: remove warnings about MEMORY_DMABUF
v4l:vb2: Add dma-contig allocator as dma_buf user
drivers/media/video/videobuf-core.c | 4 +
drivers/media/video/videobuf2-core.c | 186 +++++++++++++++++++++++++++-
drivers/media/video/videobuf2-dma-contig.c | 125 +++++++++++++++++++
include/linux/videodev2.h | 8 ++
include/media/videobuf2-core.h | 30 +++++
5 files changed, 352 insertions(+), 1 deletions(-)
--
1.7.5.4
PASR Frameworks brings support for the Partial Array Self-Refresh DDR power
management feature. PASR has been introduced in LP-DDR2, and is also present
in DDR3.
PASR provides 4 modes:
* Single-Ended: Only 1/1, 1/2, 1/4 or 1/8 are refreshed, masking starting at
the end of the DDR die.
* Double-Ended: Same as Single-Ended, but refresh-masking does not start
necessairly at the end of the DDR die.
* Bank-Selective: Refresh of each bank of a die can be masked or unmasked via
a dedicated DDR register (MR16). This mode is convenient for DDR configured
in BRC (Bank-Row-Column) mode.
* Segment-Selective: Refresh of each segment of a die can be masked or unmasked
via a dedicated DDR register (MR17). This mode is convenient for DDR configured
in RBC (Row-Bank-Column) mode.
The role of this framework is to stop the refresh of unused memory to enhance
DDR power consumption.
It supports Bank-Selective and Segment-Selective modes, as the more adapted to
modern OSes.
At early boot stage, a representation of the physical DDR layout is built:
Die 0
_______________________________
| I--------------------------I |
| I Bank or Segment 0 I |
| I--------------------------I |
| I--------------------------I |
| I Bank or Segment 1 I |
| I--------------------------I |
| I--------------------------I |
| I Bank or Segment ... I |
| I--------------------------I |
| I--------------------------I |
| I Bank or Segment n I |
| I--------------------------I |
|______________________________|
...
Die n
_______________________________
| I--------------------------I |
| I Bank or Segment 0 I |
| I--------------------------I |
| I--------------------------I |
| I Bank or Segment 1 I |
| I--------------------------I |
| I--------------------------I |
| I Bank or Segment ... I |
| I--------------------------I |
| I--------------------------I |
| I Bank or Segment n I |
| I--------------------------I |
|______________________________|
The first level is a table where elements represent a die:
* Base address,
* Number of segments,
* Table representing banks/segments,
* MR16/MR17 refresh mask,
* DDR Controller callback to update MR16/MR17 refresh mask.
The second level is the section tables representing the banks or segments,
depending on hardware configuration:
* Base address,
* Unused memory size counter,
* Possible pointer to another section it depends on (E.g. Interleaving)
When some memory becomes unused, the allocator owning this memory calls the PASR
Framework's pasr_put(phys_addr, size) function. The framework finds the
sections impacted and updates their counters accordingly.
If a section counter reach the section size, the refresh of the section is
masked. If the corresponding section has a dependency with another section
(E.g. because of DDR interleaving, see figure below), it checks the "paired" section
is also unused before updating the refresh mask.
When some unused memory is requested by the allocator, the allocator owning
this memory calls the PASR Framework's pasr_get(phys_addr, size) function. The
framework find the section impacted and updates their counters accordingly.
If before the update, the section counter was to the section size, the refrewh
of the section is unmasked. If the corresponding section has a dependency with
another section, it also unmask the refresh of the other section.
Patch 3/6 contains modifications for the Buddy allocator. Overhead induced is
very low because the PASR framework is notified only on "MAX_ORDER" pageblocs.
Any allocator support(PMEM, HWMEM...) and Memory Hotplug would be added in next
patch set revisions.
Maxime Coquelin (6):
PASR: Initialize DDR layout
PASR: Add core Framework
PASR: mm: Integrate PASR in Buddy allocator
PASR: Call PASR initialization
PASR: Add Documentation
PASR: Ux500: Add PASR support
Documentation/pasr.txt | 183 ++++++++++++
arch/arm/Kconfig | 1 +
arch/arm/kernel/setup.c | 1 +
arch/arm/mach-ux500/include/mach/hardware.h | 11 +
arch/arm/mach-ux500/include/mach/memory.h | 8 +
drivers/mfd/db8500-prcmu.c | 67 +++++
drivers/staging/Kconfig | 2 +
drivers/staging/Makefile | 1 +
drivers/staging/pasr/Kconfig | 19 ++
drivers/staging/pasr/Makefile | 6 +
drivers/staging/pasr/core.c | 168 +++++++++++
drivers/staging/pasr/helper.c | 84 ++++++
drivers/staging/pasr/helper.h | 16 +
drivers/staging/pasr/init.c | 403 +++++++++++++++++++++++++++
drivers/staging/pasr/ux500.c | 58 ++++
include/linux/pasr.h | 143 ++++++++++
include/linux/ux500-pasr.h | 11 +
init/main.c | 8 +
mm/page_alloc.c | 9 +
19 files changed, 1199 insertions(+), 0 deletions(-)
create mode 100644 Documentation/pasr.txt
create mode 100644 drivers/staging/pasr/Kconfig
create mode 100644 drivers/staging/pasr/Makefile
create mode 100644 drivers/staging/pasr/core.c
create mode 100644 drivers/staging/pasr/helper.c
create mode 100644 drivers/staging/pasr/helper.h
create mode 100644 drivers/staging/pasr/init.c
create mode 100644 drivers/staging/pasr/ux500.c
create mode 100644 include/linux/pasr.h
create mode 100644 include/linux/ux500-pasr.h
--
1.7.8
When using DMABUF streaming in non-planar mode, the v4l2_buffer::length
field holds the length of the buffer as required by userspace. Copy it
to the length of the first plane at QBUF time, as the plane length is
later checked against the dma-buf size.
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
---
drivers/media/video/videobuf2-core.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/media/video/videobuf2-core.c b/drivers/media/video/videobuf2-core.c
index 29cf6ed..8eb4d08 100644
--- a/drivers/media/video/videobuf2-core.c
+++ b/drivers/media/video/videobuf2-core.c
@@ -927,6 +927,7 @@ static int __fill_vb2_buffer(struct vb2_buffer *vb, const struct v4l2_buffer *b,
}
if (b->memory == V4L2_MEMORY_DMABUF) {
v4l2_planes[0].m.fd = b->m.fd;
+ v4l2_planes[0].length = b->length;
}
}
--
1.7.3.4
From: "joro(a)8bytes.org" <joro(a)8bytes.org>
Subject: Re: [PATCH 2/2] ARM: IOMMU: Tegra30: Add iommu_ops for SMMU driver
Date: Tue, 24 Jan 2012 15:25:21 +0100
Message-ID: <20120124142521.GE6269(a)8bytes.org>
> On Tue, Jan 24, 2012 at 03:46:01PM +0200, Felipe Balbi wrote:
> > On Tue, Jan 24, 2012 at 02:41:21PM +0100, Hiroshi Doyu wrote:
> > > Actually I really like the concept of this "domain" now, which hides
> > > the H/W hierarchy from users.
> > >
> > > But in Tegra SMMU/GART case, there's a single one IOMMU device in the
> > > system. Keeping a iommu device list in a domain and iterating iommu
> > > device list in each iommu_ops seem to be so nice, but I'm afraid that
> > > this may be a bit too much when one already knows that there's only
> > > one IOMMU device in the system.
> > >
> > > If there's no actual problem for 1-1 mapping between IOMMU H/Ws and
> > > domains, I think that it may not so bad to keep the original code(1-1)
> > > for GART and SMMU. What do you think?
> >
> > I think it boils down to "extensability". If you can truly/fully
> > guarantee that there will *always* be a single IOMMU on all upcoming
> > Tegras, then it's really overkill.
> >
> > But if there's even a remote possibility of the HW being changed and you
> > end up with more IOMMUs, things start to feel necessary for the sake of
> > making it easy to extend.
>
> Right. But I am fine with the logic as-is when there is only one SMMU in
> the system. But please also change the IOMMU driver so that it really
> only initializes a single SMMU. When boards pop up with more than one
> you we notice that assumption in the code again and are reminded to
> change it.
Fixed.
I'll revisit 4MB pagesize support and the above multiple iommu device
support in a domain later.
Attached the update patch.
Hello Everyone,
Post some discussion as an RFC, here is the patch for introducing
DMA buffer sharing mechanism - change history is in the changelog below.
Various subsystems - V4L2, GPU-accessors, DRI to name a few - have felt the
need to have a common mechanism to share memory buffers across different
devices - ARM, video hardware, GPU.
This need comes forth from a variety of use cases including cameras, image
processing, video recorders, sound processing, DMA engines, GPU and display
buffers, amongst others.
This patch attempts to define such a buffer sharing mechanism - it is the
result of discussions from a couple of memory-management mini-summits held by
Linaro to understand and address common needs around memory management. [1]
A new dma_buf buffer object is added, with operations and API to allow easy
sharing of this buffer object across devices.
The framework allows:
- a new buffer object to be created with fixed size, associated with a file
pointer and allocator-defined operations for this buffer object. This
operation is called the 'export' operation.
- different devices to 'attach' themselves to this buffer object, to facilitate
backing storage negotiation, using dma_buf_attach() API.
- this exported buffer object to be shared with the other entity by asking for
its 'file-descriptor (fd)', and sharing the fd across.
- a received fd to get the buffer object back, where it can be accessed using
the associated exporter-defined operations.
- the exporter and user to share the buffer object's scatterlist using
map_dma_buf and unmap_dma_buf operations.
Documentation present in the patch-set gives more details.
For 1st version, dma-buf is marked as an EXPERIMENTAL driver, which we can
remove for later versions with additional usage and testing.
*IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details]
For this first version, A buffer shared using the dma_buf sharing API:
- *may* be exported to user space using "mmap" *ONLY* by exporter, outside of
this framework.
- may be used *ONLY* by importers that do not need CPU access to the buffer.
This is based on design suggestions from many people at the mini-summits,
most notably from Arnd Bergmann <arnd(a)arndb.de>, Rob Clark <rob(a)ti.com> and
Daniel Vetter <daniel(a)ffwll.ch>.
The implementation is inspired from proof-of-concept patch-set from
Tomasz Stanislawski <t.stanislaws(a)samsung.com>, who demonstrated buffer sharing
between two v4l2 devices. [2]
Some sample implementations and WIP for dma-buf users and exporters are
available at [3] and [4]. [These are not being submitted for discussion /
inclusion right now, but are for reference only]
References:
[1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
[2]: http://lwn.net/Articles/454389
[3]: Dave Airlie's prime support:
http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf
[4]: Rob Clark's sharing between DRM and V4L2:
https://github.com/robclark/kernel-omap4/commits/drmplane-dmabuf
Patchset based on top of 3.2-rc7, the current version can be found at
http://git.linaro.org/gitweb?p=people/sumitsemwal/linux-3.x.git
Branch: dmabuf-patch-v1
Earlier versions:
RFC:
v3 at: https://lkml.org/lkml/2011/12/19/50
v2 at: https://lkml.org/lkml/2011/12/2/53
v1 at: https://lkml.org/lkml/2011/10/11/92
Wish you all happy vacations and a very happy, joyous and prosperous new year
2012 :)
Best regards,
~Sumit Semwal
History:
v4:
- Review comments incorporated:
- from Konrad Rzeszutek Wilk [https://lkml.org/lkml/2011/12/20/209]
- corrected language in some comments
- re-ordered struct definitions for readability
- added might_sleep() call in dma_buf_map_attachment() wrapper
- from Rob Clark [https://lkml.org/lkml/2011/12/23/196]
- Made dma-buf EXPERIMENTAL for 1st version.
v3:
- Review comments incorporated:
- from Konrad Rzeszutek Wilk [https://lkml.org/lkml/2011/12/3/45]
- replaced BUG_ON with WARN_ON - various places
- added some error-checks
- replaced EXPORT_SYMBOL with EXPORT_SYMBOL_GPL
- some cosmetic / documentation comments
- from Arnd Bergmann, Daniel Vetter, Rob Clark
[https://lkml.org/lkml/2011/12/5/321]
- removed mmap() fop and dma_buf_op, also the sg_sync* operations, and
documented that mmap is not allowed for exported buffer
- updated documentation to clearly state when migration is allowed
- changed kconfig
- some error code checks
- from Rob Clark [https://lkml.org/lkml/2011/12/5/572]
- update documentation to allow map_dma_buf to return -EINTR
v2:
- Review comments incorporated:
- from Tomasz Stanislawski [https://lkml.org/lkml/2011/10/14/136]
- kzalloc moved out of critical section
- corrected some in-code comments
- from Dave Airlie [https://lkml.org/lkml/2011/11/25/123]
- from Daniel Vetter and Rob Clark [https://lkml.org/lkml/2011/11/26/53]
- use struct sg_table in place of struct scatterlist
- rename {get,put}_scatterlist to {map,unmap}_dma_buf
- add new wrapper APIs dma_buf_{map,unmap}_attachment for ease of users
- documentation updates as per review comments from Randy Dunlap
[https://lkml.org/lkml/2011/10/12/439]
v1: original
Sumit Semwal (3):
dma-buf: Introduce dma buffer sharing mechanism
dma-buf: Documentation for buffer sharing framework
dma-buf: mark EXPERIMENTAL for 1st release.
Documentation/dma-buf-sharing.txt | 224 ++++++++++++++++++++++++++++
drivers/base/Kconfig | 11 ++
drivers/base/Makefile | 1 +
drivers/base/dma-buf.c | 291 +++++++++++++++++++++++++++++++++++++
include/linux/dma-buf.h | 176 ++++++++++++++++++++++
5 files changed, 703 insertions(+), 0 deletions(-)
create mode 100644 Documentation/dma-buf-sharing.txt
create mode 100644 drivers/base/dma-buf.c
create mode 100644 include/linux/dma-buf.h
--
1.7.5.4
Hello,
This is another update on my attempt on DMA-mapping framework redesign
for ARM architecture. It includes a few minor changes since last
version. We have focused mainly on IOMMU mapper, keeping the DMA-mapping
redesign patches almost unchanged.
All patches have been now rebased onto v3.2-rc4 kernel + IOMMU/next
branch to include latest changes from IOMMU kernel tree.
This series also contains support for mapping with pages larger than
4KiB using new, extended IOMMU API. This code has been provided by
Andrzej Pietrasiewicz.
All the code has been tested on Samsung Exynos4 'UniversalC210' board
with IOMMU driver posted by KyongHo Cho.
GIT tree will all the patches (including some Samsung Exynos4 stuff):
http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/3.2…git://git.infradead.org/users/kmpark/linux-samsung 3.2-rc4-dma-v5-samsung
History:
Initial version of the DMA-mapping redesign patches:
http://www.spinics.net/lists/linux-mm/msg21241.html
Second version of the patches:
http://lists.linaro.org/pipermail/linaro-mm-sig/2011-September/000571.htmlhttp://lists.linaro.org/pipermail/linaro-mm-sig/2011-September/000577.html
Third version of the patches:
http://www.spinics.net/lists/linux-mm/msg25490.html
TODO:
- start the discussion about chaning alloc_coherent into alloc_attrs in
dma_map_ops structure.
- start the discussion about dma_mmap function
- provide documentation for the new dma attributes
Best regards
--
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (8):
ARM: dma-mapping: remove offset parameter to prepare for generic
dma_ops
ARM: dma-mapping: use asm-generic/dma-mapping-common.h
ARM: dma-mapping: implement dma sg methods on top of any generic dma
ops
ARM: dma-mapping: move all dma bounce code to separate dma ops
structure
ARM: dma-mapping: remove redundant code and cleanup
common: dma-mapping: change alloc/free_coherent method to more
generic alloc/free_attrs
ARM: dma-mapping: use alloc, mmap, free from dma_ops
ARM: initial proof-of-concept IOMMU mapper for DMA-mapping
arch/arm/Kconfig | 9 +
arch/arm/common/dmabounce.c | 78 +++-
arch/arm/include/asm/device.h | 4 +
arch/arm/include/asm/dma-iommu.h | 36 ++
arch/arm/include/asm/dma-mapping.h | 404 +++++------------
arch/arm/mm/dma-mapping.c | 899 ++++++++++++++++++++++++++++++------
arch/arm/mm/vmregion.h | 2 +-
include/linux/dma-attrs.h | 1 +
include/linux/dma-mapping.h | 13 +-
9 files changed, 994 insertions(+), 452 deletions(-)
create mode 100644 arch/arm/include/asm/dma-iommu.h
--
1.7.1.569.g6f426
On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> Hi!
>
> When TTM was originally written, it was assumed that GPU apertures
> could address pages directly, and that the CPU could access those
> pages without explicit synchronization. The process of binding a
> page to a GPU translation table was a simple one-step operation, and
> we needed to worry about fragmentation in the GPU aperture only.
>
> Now that we "sort of" support DMA memory there are three things I
> think are missing:
>
> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> (Including CMA) memory fragmentation leading to failed allocations.
> 2) We can't handle dynamic mapping of pages into and out of dma, and
> corresponding IOMMU space shortage or fragmentation, and CPU
> synchronization.
> 3) We have no straightforward way of moving pages between devices.
>
> I think a reasonable way to support this is to make binding to a
> non-fixed (system page based) TTM memory type a two-step binding
> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> instead of only (MEMORY_TYPE).
>
> In step 1) the bo is bound to a specific DMA type. These could be
> for example:
> (NONE, DYNAMIC, COHERENT, CMA), .... device dependent types could be
> allowed as well.
> In this step, we perform dma_sync_for_device, or allocate
> dma-specific pages maintaining LRU lists so that if we receive a DMA
> memory allocation OOM, we can unbind bo:s bound to the same DMA
> type. Standard graphics cards would then, for example, use the NONE
> DMA type when run on bare metal or COHERENT when run on Xen. A
> "COHERENT" OOM condition would then lead to eviction of another bo.
> (Note that DMA eviction might involve data copies and be costly, but
> still better than failing).
> Binding with the DYNAMIC memory type would mean that CPU accesses
> are disallowed, and that user-space CPU page mappings might need to
> be killed, with a corresponding sync_for_cpu if they are faulted in
> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
>
> In step 2) The bo is bound to the GPU in the same way it's done
> today. Evicting from DMA will of course also trigger an evict from
> GPU, but an evict from GPU will not trigger a DMA evict.
>
> Making a bo "anonymous" and thus moveable between devices would then
> mean binding it to the "NONE" DMA type.
>
> Comments, suggestions?
Well I think we need to solve outstanding issues in the dma_buf framework
first. Currently dma_buf isn't really up to par to handle coherency
between the cpu and devices and there's also not yet any way to handle dma
address space fragmentation/exhaustion.
I fear that if you jump ahead with improving the ttm support alone we
might end up with something incompatible to the stuff dma_buf eventually
will grow, resulting in decent amounts of wasted efforts.
Cc'ed a bunch of relevant lists to foster input from people.
For a starter you seem to want much more low-level integration with the
dma api than existing users commonly need. E.g. if I understand things
correctly drivers just call dma_alloc_coherent and the platform/board code
then decides whether the device needs a contigious allocation from cma or
whether something else is good, too (e.g. vmalloc for the cpu + iommu).
Another thing is that I think doing lru eviction in case of dma address
space exhaustion (or fragmentation) needs at least awereness of what's
going on in the upper layers. iommus are commonly shared between devices
and I presume that two ttm drivers sitting behind the same iommu and
fighting over it's resources can lead to some hilarious outcomes.
Cheers, Daniel
--
Daniel Vetter
Mail: daniel(a)ffwll.ch
Mobile: +41 (0)79 365 57 48