The following series implements the updated api for wait/wound mutex locks.
The documentation and api should be complete, the implementation may not be
final. There is no support for -rt yet, and TASK_DEADLOCK handling is missing
too. However I believe that this is an implementation detail, and that the
interface for users of the api will not behave differently.
ww_acquire_ctx has been added, and a whole lot of api abuses are now correctly
detected because of the extra state carried in ww_acquire_ctx if debugging is
enabled.
---
Maarten Lankhorst (3):
arch: make __mutex_fastpath_lock_retval return whether fastpath succeeded or not.
mutex: add support for wound/wait style locks, v3
mutex: Add ww tests to lib/locking-selftest.c. v3
Documentation/ww-mutex-design.txt | 322 +++++++++++++++++++++++++
arch/ia64/include/asm/mutex.h | 10 -
arch/powerpc/include/asm/mutex.h | 10 -
arch/sh/include/asm/mutex-llsc.h | 4
arch/x86/include/asm/mutex_32.h | 11 -
arch/x86/include/asm/mutex_64.h | 11 -
include/asm-generic/mutex-dec.h | 10 -
include/asm-generic/mutex-null.h | 2
include/asm-generic/mutex-xchg.h | 10 -
include/linux/mutex-debug.h | 1
include/linux/mutex.h | 257 ++++++++++++++++++++
kernel/mutex.c | 473 ++++++++++++++++++++++++++++++++++---
lib/debug_locks.c | 2
lib/locking-selftest.c | 439 +++++++++++++++++++++++++++++++++-
14 files changed, 1469 insertions(+), 93 deletions(-)
create mode 100644 Documentation/ww-mutex-design.txt
--
Signature
Hello,
Contiguous Memory Allocator is very sensitive about migration failures
of the individual pages. A single page, which causes permanent migration
failure can break large conitguous allocations and cause the failure of
a multimedia device driver.
One of the known issues with migration of CMA pages are the problems of
migrating the anonymous user pages, for which the others called
get_user_pages(). This takes a reference to the given user pages to let
kernel to operate directly on the page content. This is usually used for
preventing swaping out the page contents and doing direct DMA to/from
userspace.
To solving this issue requires preventing locking of the pages, which
are placed in CMA regions, for a long time. Our idea is to migrate
anonymous page content before locking the page in get_user_pages(). This
cannot be done automatically, as get_user_pages() interface is used very
often for various operations, which usually last for a short period of
time (like for example exec syscall). We have added a new flag
indicating that the given get_user_space() call will grab pages for a
long time, thus it is suitable to use the migration workaround in such
cases.
The proposed extensions is used by V4L2/VideoBuf2
(drivers/media/v4l2-core/videobuf2-dma-contig.c), but that is not the
only place which might benefit from it, like any driver which use DMA to
userspace with get_user_pages(). This one is provided to demonstrate the
use case.
I would like to hear some comments on the presented approach. What do
you think about it? Is there a chance to get such workaround merged at
some point to mainline?
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (5):
mm: introduce migrate_replace_page() for migrating page to the given
target
mm: get_user_pages: use static inline
mm: get_user_pages: use NON-MOVABLE pages when FOLL_DURABLE flag is
set
mm: get_user_pages: migrate out CMA pages when FOLL_DURABLE flag is
set
media: vb2: use FOLL_DURABLE and __get_user_pages() to avoid CMA
migration issues
drivers/media/v4l2-core/videobuf2-dma-contig.c | 8 +-
include/linux/highmem.h | 12 ++-
include/linux/migrate.h | 5 +
include/linux/mm.h | 76 ++++++++++++-
mm/internal.h | 12 +++
mm/memory.c | 136 +++++++++++-------------
mm/migrate.c | 59 ++++++++++
7 files changed, 225 insertions(+), 83 deletions(-)
--
1.7.9.5
On Wed, May 1, 2013 at 6:30 AM, Dave Airlie <airlied(a)gmail.com> wrote:
> Since we ask the dmabuf owner to map the dma-buf into our device
> address space, but for udl at present that is the CPU address space,
> since we don't DMA directly from the mapped buffer.
>
> However if we don't set a dma mask on the usb device, the mapping
> ends up using swiotlb on machines that have it enabled, which
> is less than desireable.
>
> Signed-off-by: Dave Airlie <airlied(a)redhat.com>
Fyi for everyone else who was not on irc when Dave&I discussed this:
This really shouldn't be required and I think the real issue is that
udl creates a dma_buf attachement (which is needed for device dma
only), but only really wants to do cpu access through vmap/kmap. So
not attached the device should be good enough. Cc'ing a few more lists
for better fyi ;-)
-Daniel
> ---
> drivers/gpu/drm/udl/udl_main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/udl/udl_main.c b/drivers/gpu/drm/udl/udl_main.c
> index 0ce2d71..6770e1b 100644
> --- a/drivers/gpu/drm/udl/udl_main.c
> +++ b/drivers/gpu/drm/udl/udl_main.c
> @@ -293,6 +293,7 @@ int udl_driver_load(struct drm_device *dev, unsigned long flags)
> udl->ddev = dev;
> dev->dev_private = udl;
>
> + dma_set_mask(dev->dev, DMA_BIT_MASK(64));
> if (!udl_parse_vendor_descriptor(dev, dev->usbdev)) {
> DRM_ERROR("firmware not recognized. Assume incompatible device\n");
> goto err;
> --
> 1.8.2
>
> _______________________________________________
> dri-devel mailing list
> dri-devel(a)lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Hi all,
I've been looking at a better way to do custom dma allocation algorithms
in a similar style to Ion heaps. Most drivers/clients have come up with
a series of semi-standard ways to get memory (CMA, memblock_reserve,
discontiguous pages etc.) . As these allocation schemes get more and
more complex, there needs to be a since place where all clients (Ion
based driver vs. DRM driver vs. ???) can independently take advantage
of any optimizations and call a single API for the backing allocations.
The dma_map_ops take care of almost everything needed for abstraction
but the question is where should new allocation algorithms be located?
Most of the work has been added to either arm/mm/dma-mapping.c or
dma-contiguous.c . My current thought:
1) split out the dma_map_ops currently in dma-mapping.c into separate
files (dma-mapping-common.c, dma-mapping-iommu.c)
2) Extend dma-contiguous.c to support memblock_reserve memory
3) Place additional algorithms in either arch/arm/mm or
drivers/base/dma-alloc/ as appropriate to the code. This is the part
where I'm most unsure about the direction.
I don't have anything written yet but I plan to draft some patches
assuming the proposed approach sounds reasonable and no one else has
started on something similar already.
Thoughts? Opinions?
Thanks,
Laura
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
Hi Linus,
The 3.10 pull request for dma-buf framework updates: small one, could
you please pull?
Thanks and best regards,
~Sumit.
The following changes since commit 5f56886521d6ddd3648777fae44d82382dd8c87f:
Merge branch 'akpm' (incoming from Andrew) (2013-04-30 17:37:43 -0700)
are available in the git repository at:
git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git
tags/tag-for-linus-3.10
for you to fetch changes up to b89e35636bc75b72d15a1af6d49798802aff77d5:
dma-buf: Add debugfs support (2013-05-01 16:36:22 +0530)
----------------------------------------------------------------
3.10 dma-buf updates
Added debugfs support to dma-buf.
----------------------------------------------------------------
Sumit Semwal (2):
dma-buf: replace dma_buf_export() with dma_buf_export_named()
dma-buf: Add debugfs support
Documentation/dma-buf-sharing.txt | 13 ++-
drivers/base/dma-buf.c | 169 +++++++++++++++++++++++++++++++++++++-
include/linux/dma-buf.h | 16 +++-
3 files changed, 189 insertions(+), 9 deletions(-)
Hello,
This is an update for my proposal for device tree integration for
Contiguous Memory Allocator. The code is quite straightforward, but
expect again that the device tree bindings will trigger some discussion.
Just a few words for those who see this code for the first time:
The proposed bindings allows to define contiguous memory regions of
specified base address and size. Then, the defined regions can be
assigned to the given device(s) by adding a property with a phanle to
the defined contiguous memory region. From the device tree perspective
that's all. Once the bindings are added, all the memory allocations from
dma-mapping subsystem will be served from the defined contiguous memory
regions.
Contiguous Memory Allocator is a framework, which lets to provide a
large contiguous memory buffers for (usually a multimedia) devices. The
contiguous memory is reserved during early boot and then shared with
kernel, which is allowed to allocate it for movable pages. Then, when
device driver requests a contigouous buffer, the framework migrates
movable pages out of contiguous region and gives it to the driver. When
device driver frees the buffer, it is added to kernel memory pool again.
For more information, please refer to commit c64be2bb1c6eb43c838b2c6d57
("drivers: add Contiguous Memory Allocator") and d484864dd96e1830e76895
(CMA merge commit).
Why we need device tree bindings for CMA at all?
Older ARM kernels used so called board-based initialization. Those board
files contained a definition of all hardware blocks available on the
target system and particular kernel and driver software configuration
selected by the board maintainer.
In the new approach the board files will be removed completely and
Device Tree approach is used to describe all hardware blocks available
on the target system. By definition, the bindings should be software
independent, so at least in theory it should be possible to use those
bindings with other operating systems than Linux kernel.
However we also need to pass somehow the information about kernel and
device driver software-only configuration data, which were earlier
encoded in the board file. For such data I propose to use /chosen node,
where kernel command line has been already stored. Future bootloaders
will allow to modify or replace particular nodes and one will be able to
use custom /chosen node to configure his system. The proposed patches
introduce /chosen/contiguous-memory node and related bindings, to avoid
complicated encoding of CMA related configuration to kernel command
line.
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Changelog:
v2:
- moved contiguous-memory bindings from /memory to /chosen/contiguous-memory/
node to avoid spreading Linux specific parameters over the whole device
tree definitions
- added support for autoconfigured regions (use zero base)
- fixes minor bugs
v1: http://thread.gmane.org/gmane.linux.drivers.devicetree/30111/
- initial proposal
Patch summary:
Marek Szyprowski (2):
drivers: dma-contiguous: clean source code and prepare for device
tree
drivers: dma-contiguous: add initialization from device tree
Documentation/devicetree/bindings/memory.txt | 101 ++++++++++
arch/arm/boot/dts/skeleton.dtsi | 7 +-
drivers/base/dma-contiguous.c | 278 +++++++++++++++++++-------
include/asm-generic/dma-contiguous.h | 4 +-
include/linux/dma-contiguous.h | 32 ++-
5 files changed, 338 insertions(+), 84 deletions(-)
create mode 100644 Documentation/devicetree/bindings/memory.txt
--
1.7.9.5
This will allow me to call functions that have multiple arguments if fastpath fails.
This is required to support ticket mutexes, because they need to be able to pass an
extra argument to the fail function.
Originally I duplicated the functions, by adding __mutex_fastpath_lock_retval_arg.
This ended up being just a duplication of the existing function, so a way to test
if fastpath was called ended up being better.
This also cleaned up the reservation mutex patch some by being able to call an
atomic_set instead of atomic_xchg, and making it easier to detect if the wrong
unlock function was previously used.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)canonical.com>
---
arch/ia64/include/asm/mutex.h | 10 ++++------
arch/powerpc/include/asm/mutex.h | 10 ++++------
arch/sh/include/asm/mutex-llsc.h | 4 ++--
arch/x86/include/asm/mutex_32.h | 11 ++++-------
arch/x86/include/asm/mutex_64.h | 11 ++++-------
include/asm-generic/mutex-dec.h | 10 ++++------
include/asm-generic/mutex-null.h | 2 +-
include/asm-generic/mutex-xchg.h | 10 ++++------
kernel/mutex.c | 32 ++++++++++++++------------------
9 files changed, 41 insertions(+), 59 deletions(-)
diff --git a/arch/ia64/include/asm/mutex.h b/arch/ia64/include/asm/mutex.h
index bed73a6..f41e66d 100644
--- a/arch/ia64/include/asm/mutex.h
+++ b/arch/ia64/include/asm/mutex.h
@@ -29,17 +29,15 @@ __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
* __mutex_fastpath_lock_retval - try to take the lock by moving the count
* from 1 to a 0 value
* @count: pointer of type atomic_t
- * @fail_fn: function to call if the original value was not 1
*
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function returns 0 if the fastpath succeeds,
- * or anything the slow path function returns.
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or -1 otherwise.
*/
static inline int
-__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
+__mutex_fastpath_lock_retval(atomic_t *count)
{
if (unlikely(ia64_fetchadd4_acq(count, -1) != 1))
- return fail_fn(count);
+ return -1;
return 0;
}
diff --git a/arch/powerpc/include/asm/mutex.h b/arch/powerpc/include/asm/mutex.h
index 5399f7e..127ab23 100644
--- a/arch/powerpc/include/asm/mutex.h
+++ b/arch/powerpc/include/asm/mutex.h
@@ -82,17 +82,15 @@ __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
* __mutex_fastpath_lock_retval - try to take the lock by moving the count
* from 1 to a 0 value
* @count: pointer of type atomic_t
- * @fail_fn: function to call if the original value was not 1
*
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function returns 0 if the fastpath succeeds,
- * or anything the slow path function returns.
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or -1 otherwise.
*/
static inline int
-__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
+__mutex_fastpath_lock_retval(atomic_t *count)
{
if (unlikely(__mutex_dec_return_lock(count) < 0))
- return fail_fn(count);
+ return -1;
return 0;
}
diff --git a/arch/sh/include/asm/mutex-llsc.h b/arch/sh/include/asm/mutex-llsc.h
index 090358a..dad29b6 100644
--- a/arch/sh/include/asm/mutex-llsc.h
+++ b/arch/sh/include/asm/mutex-llsc.h
@@ -37,7 +37,7 @@ __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
}
static inline int
-__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
+__mutex_fastpath_lock_retval(atomic_t *count)
{
int __done, __res;
@@ -51,7 +51,7 @@ __mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
: "t");
if (unlikely(!__done || __res != 0))
- __res = fail_fn(count);
+ __res = -1;
return __res;
}
diff --git a/arch/x86/include/asm/mutex_32.h b/arch/x86/include/asm/mutex_32.h
index 03f90c8..b7f6b34 100644
--- a/arch/x86/include/asm/mutex_32.h
+++ b/arch/x86/include/asm/mutex_32.h
@@ -42,17 +42,14 @@ do { \
* __mutex_fastpath_lock_retval - try to take the lock by moving the count
* from 1 to a 0 value
* @count: pointer of type atomic_t
- * @fail_fn: function to call if the original value was not 1
*
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if it
- * wasn't 1 originally. This function returns 0 if the fastpath succeeds,
- * or anything the slow path function returns
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or 1 otherwise.
*/
-static inline int __mutex_fastpath_lock_retval(atomic_t *count,
- int (*fail_fn)(atomic_t *))
+static inline int __mutex_fastpath_lock_retval(atomic_t *count)
{
if (unlikely(atomic_dec_return(count) < 0))
- return fail_fn(count);
+ return -1;
else
return 0;
}
diff --git a/arch/x86/include/asm/mutex_64.h b/arch/x86/include/asm/mutex_64.h
index 68a87b0..2c543ff 100644
--- a/arch/x86/include/asm/mutex_64.h
+++ b/arch/x86/include/asm/mutex_64.h
@@ -37,17 +37,14 @@ do { \
* __mutex_fastpath_lock_retval - try to take the lock by moving the count
* from 1 to a 0 value
* @count: pointer of type atomic_t
- * @fail_fn: function to call if the original value was not 1
*
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function returns 0 if the fastpath succeeds,
- * or anything the slow path function returns
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or -1 otherwise.
*/
-static inline int __mutex_fastpath_lock_retval(atomic_t *count,
- int (*fail_fn)(atomic_t *))
+static inline int __mutex_fastpath_lock_retval(atomic_t *count)
{
if (unlikely(atomic_dec_return(count) < 0))
- return fail_fn(count);
+ return -1;
else
return 0;
}
diff --git a/include/asm-generic/mutex-dec.h b/include/asm-generic/mutex-dec.h
index f104af7..d4f9fb4 100644
--- a/include/asm-generic/mutex-dec.h
+++ b/include/asm-generic/mutex-dec.h
@@ -28,17 +28,15 @@ __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
* __mutex_fastpath_lock_retval - try to take the lock by moving the count
* from 1 to a 0 value
* @count: pointer of type atomic_t
- * @fail_fn: function to call if the original value was not 1
*
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function returns 0 if the fastpath succeeds,
- * or anything the slow path function returns.
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or -1 otherwise.
*/
static inline int
-__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
+__mutex_fastpath_lock_retval(atomic_t *count)
{
if (unlikely(atomic_dec_return(count) < 0))
- return fail_fn(count);
+ return -1;
return 0;
}
diff --git a/include/asm-generic/mutex-null.h b/include/asm-generic/mutex-null.h
index e1bbbc7..efd6206 100644
--- a/include/asm-generic/mutex-null.h
+++ b/include/asm-generic/mutex-null.h
@@ -11,7 +11,7 @@
#define _ASM_GENERIC_MUTEX_NULL_H
#define __mutex_fastpath_lock(count, fail_fn) fail_fn(count)
-#define __mutex_fastpath_lock_retval(count, fail_fn) fail_fn(count)
+#define __mutex_fastpath_lock_retval(count, fail_fn) (-1)
#define __mutex_fastpath_unlock(count, fail_fn) fail_fn(count)
#define __mutex_fastpath_trylock(count, fail_fn) fail_fn(count)
#define __mutex_slowpath_needs_to_unlock() 1
diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
index c04e0db..f169ec0 100644
--- a/include/asm-generic/mutex-xchg.h
+++ b/include/asm-generic/mutex-xchg.h
@@ -39,18 +39,16 @@ __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
* __mutex_fastpath_lock_retval - try to take the lock by moving the count
* from 1 to a 0 value
* @count: pointer of type atomic_t
- * @fail_fn: function to call if the original value was not 1
*
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if it
- * wasn't 1 originally. This function returns 0 if the fastpath succeeds,
- * or anything the slow path function returns
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or -1 otherwise.
*/
static inline int
-__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
+__mutex_fastpath_lock_retval(atomic_t *count)
{
if (unlikely(atomic_xchg(count, 0) != 1))
if (likely(atomic_xchg(count, -1) != 1))
- return fail_fn(count);
+ return -1;
return 0;
}
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 52f2301..84a5f07 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -351,10 +351,10 @@ __mutex_unlock_slowpath(atomic_t *lock_count)
* mutex_lock_interruptible() and mutex_trylock().
*/
static noinline int __sched
-__mutex_lock_killable_slowpath(atomic_t *lock_count);
+__mutex_lock_killable_slowpath(struct mutex *lock);
static noinline int __sched
-__mutex_lock_interruptible_slowpath(atomic_t *lock_count);
+__mutex_lock_interruptible_slowpath(struct mutex *lock);
/**
* mutex_lock_interruptible - acquire the mutex, interruptible
@@ -372,12 +372,12 @@ int __sched mutex_lock_interruptible(struct mutex *lock)
int ret;
might_sleep();
- ret = __mutex_fastpath_lock_retval
- (&lock->count, __mutex_lock_interruptible_slowpath);
- if (!ret)
+ ret = __mutex_fastpath_lock_retval(&lock->count);
+ if (likely(!ret)) {
mutex_set_owner(lock);
-
- return ret;
+ return 0;
+ } else
+ return __mutex_lock_interruptible_slowpath(lock);
}
EXPORT_SYMBOL(mutex_lock_interruptible);
@@ -387,12 +387,12 @@ int __sched mutex_lock_killable(struct mutex *lock)
int ret;
might_sleep();
- ret = __mutex_fastpath_lock_retval
- (&lock->count, __mutex_lock_killable_slowpath);
- if (!ret)
+ ret = __mutex_fastpath_lock_retval(&lock->count);
+ if (likely(!ret)) {
mutex_set_owner(lock);
-
- return ret;
+ return 0;
+ } else
+ return __mutex_lock_killable_slowpath(lock);
}
EXPORT_SYMBOL(mutex_lock_killable);
@@ -405,18 +405,14 @@ __mutex_lock_slowpath(atomic_t *lock_count)
}
static noinline int __sched
-__mutex_lock_killable_slowpath(atomic_t *lock_count)
+__mutex_lock_killable_slowpath(struct mutex *lock)
{
- struct mutex *lock = container_of(lock_count, struct mutex, count);
-
return __mutex_lock_common(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
}
static noinline int __sched
-__mutex_lock_interruptible_slowpath(atomic_t *lock_count)
+__mutex_lock_interruptible_slowpath(struct mutex *lock)
{
- struct mutex *lock = container_of(lock_count, struct mutex, count);
-
return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
}
#endif
We unlock here when we failed to take the lock.
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
---
This is in linux-next, and I think the debugfs code is only in Sumit's
tree.
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 466476f..174cd2c 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -593,7 +593,7 @@ static int dma_buf_describe(struct seq_file *s)
if (ret) {
seq_printf(s,
"\tERROR locking buffer object: skipping\n");
- goto skip_buffer;
+ continue;
}
seq_printf(s, "\t");
@@ -618,7 +618,6 @@ static int dma_buf_describe(struct seq_file *s)
count++;
size += buf_obj->size;
-skip_buffer:
mutex_unlock(&buf_obj->lock);
}
Hi Tom,
On Tuesday 09 April 2013 12:21:08 Tom Cooksey wrote:
> Hi All,
>
> Last year Laurent posted an RFC patch[i] to add support for exporting an
> fbdev framebuffer through dma_buf. Looking through the mailing list
> archives, it doesn't appear to have progressed beyond an RFC? What would be
> needed to get this merged? It would be useful for our Mali T6xx driver
> (which supports importing dma_buf buffers) to allow the GPU to draw
> directly into the framebuffer on platforms which lack a DRM/KMS driver.
The patch was pretty simple, I don't think it would take lots of efforts to
get it to mainline. On the other hand, fbdev is a dying API, so I'm not sure
how much energy we want to spend on upgrading it. I suppose all that would be
needed is a developer with enough interest in the topic to fix the patch
according to the comments.
> [i] Subject: "[RFC/PATCH] fb: Add dma-buf support", sent 20/06/2012.
--
Regards,
Laurent Pinchart
Hi All,
Last year Laurent posted an RFC patch[i] to add support for exporting an fbdev framebuffer through
dma_buf. Looking through the mailing list archives, it doesn't appear to have progressed beyond an
RFC? What would be needed to get this merged? It would be useful for our Mali T6xx driver (which
supports importing dma_buf buffers) to allow the GPU to draw directly into the framebuffer on
platforms which lack a DRM/KMS driver.
[i] Subject: "[RFC/PATCH] fb: Add dma-buf support", sent 20/06/2012.
Cheers,
Tom
The patch series adds a much-missed support for debugfs to dma-buf framework.
Based on the feedback received on v1 of this patch series, support is also
added to allow exporters to provide name-strings that will prove useful
while debugging.
Some more magic can be added for more advanced debugging, but we'll leave that
for the time being.
Best regards,
~Sumit.
---
changes since v2: (based on review comments from Laurent Pinchart)
- reordered functions to avoid forward declaration
- added __exitcall for dma_buf_deinit()
changes since v1:
- added patch to replace dma_buf_export() with dma_buf_export_named(), per
suggestion from Daniel Vetter.
- fixes on init and warnings as reported and corrected by Dave Airlie.
- added locking while walking attachment list - reported by Daniel Vetter.
Sumit Semwal (2):
dma-buf: replace dma_buf_export() with dma_buf_export_named()
dma-buf: Add debugfs support
Documentation/dma-buf-sharing.txt | 13 ++-
drivers/base/dma-buf.c | 170 ++++++++++++++++++++++++++++++++++++-
include/linux/dma-buf.h | 16 +++-
3 files changed, 190 insertions(+), 9 deletions(-)
--
1.7.10.4
The patch series adds a much-missed support for debugfs to dma-buf framework.
Based on the feedback received on v1 of this patch series, support is also
added to allow exporters to provide name-strings that will prove useful
while debugging.
Some more magic can be added for more advanced debugging, but we'll leave that
for the time being.
Best regards,
~Sumit.
Sumit Semwal (2):
dma-buf: replace dma_buf_export() with dma_buf_export_named()
dma-buf: Add debugfs support
Documentation/dma-buf-sharing.txt | 13 ++-
drivers/base/dma-buf.c | 173 ++++++++++++++++++++++++++++++++++++-
include/linux/dma-buf.h | 16 +++-
3 files changed, 193 insertions(+), 9 deletions(-)
--
1.7.10.4
The goal of those patches is to allow ION clients (drivers or userland applications)
to use Contiguous Memory Allocator (CMA).
To get more info about CMA:
http://lists.linaro.org/pipermail/linaro-mm-sig/2012-February/001328.html
patches version 10:
- stop adding private field in ion_heap structure
- put ion_heap into struct ion_cma_heap
patches version 9:
- rebased on Android kernel
- make cma heap able to support ION_FLAG_CACHED flag
patches version 8:
- fix memory leak when release sg_table
- remove virt_to_phys from ion_cma_phys
patches version 7:
- rebased on Android kernel
- fix ion Makefile
- add ion_cma_map_kernel function
- remove CONFIG_CMA compilation flags from ion_heap.c
patches version 6:
- add private field in ion_platform_heap to pass the device
linked with CMA.
- rework CMA heap to use private field.
- prepare CMA heap for incoming dma_common_get_sgtable function
http://lists.linaro.org/pipermail/linaro-mm-sig/2012-June/002109.html
- simplify ion-ux500 driver.
patches version 5:
- port patches on android kernel 3.4 where ION use dmabuf
- add ion_cma_heap_map_dma and ion_cma_heap_unmap_dma functions
patches version 4:
- add ION_HEAP_TYPE_DMA heap type in ion_heap_type enum.
- CMA heap is now a "native" ION heap.
- add ion_heap_create_full function to keep backward compatibilty.
- clean up included files in CMA heap
- ux500-ion is using ion_heap_create_full instead of ion_heap_create
patches version 3:
- add a private field in ion_heap structure instead of expose ion_device
structure to all heaps
- ion_cma_heap is no more a platform driver
- ion_cma_heap use ion_heap private field to store the device pointer and
make the link with reserved CMA regions
- provide ux500-ion driver and configuration file for snowball board to give
an example of how use CMA heaps
patches version 2:
- fix comments done by Andy Green
Benjamin Gaignard (2):
gpu: ion: fix ion_platform_data definition
gpu: ion: add CMA heap
drivers/gpu/ion/Makefile | 1 +
drivers/gpu/ion/ion_cma_heap.c | 234 ++++++++++++++++++++++++++++++++++++++++
drivers/gpu/ion/ion_heap.c | 6 ++
drivers/gpu/ion/ion_priv.h | 14 +++
include/linux/ion.h | 5 +-
5 files changed, 259 insertions(+), 1 deletion(-)
create mode 100644 drivers/gpu/ion/ion_cma_heap.c
--
1.7.10
The goal of those patches is to allow ION clients (drivers or userland applications)
to use Contiguous Memory Allocator (CMA).
To get more info about CMA:
http://lists.linaro.org/pipermail/linaro-mm-sig/2012-February/001328.html
patches version 9:
- rebased on Android kernel
- make cma heap able to support ION_FLAG_CACHED flag
patches version 8:
- fix memory leak when release sg_table
- remove virt_to_phys from ion_cma_phys
patches version 7:
- rebased on Android kernel
- fix ion Makefile
- add ion_cma_map_kernel function
- remove CONFIG_CMA compilation flags from ion_heap.c
patches version 6:
- add private field in ion_platform_heap to pass the device
linked with CMA.
- rework CMA heap to use private field.
- prepare CMA heap for incoming dma_common_get_sgtable function
http://lists.linaro.org/pipermail/linaro-mm-sig/2012-June/002109.html
- simplify ion-ux500 driver.
patches version 5:
- port patches on android kernel 3.4 where ION use dmabuf
- add ion_cma_heap_map_dma and ion_cma_heap_unmap_dma functions
patches version 4:
- add ION_HEAP_TYPE_DMA heap type in ion_heap_type enum.
- CMA heap is now a "native" ION heap.
- add ion_heap_create_full function to keep backward compatibilty.
- clean up included files in CMA heap
- ux500-ion is using ion_heap_create_full instead of ion_heap_create
patches version 3:
- add a private field in ion_heap structure instead of expose ion_device
structure to all heaps
- ion_cma_heap is no more a platform driver
- ion_cma_heap use ion_heap private field to store the device pointer and
make the link with reserved CMA regions
- provide ux500-ion driver and configuration file for snowball board to give
an example of how use CMA heaps
patches version 2:
- fix comments done by Andy Green
Benjamin Gaignard (3):
gpu: ion: fix ion_platform_data definition
gpu: ion: add private field in ion_heap and ion_platform_heap
structure
gpu: ion: add CMA heap
drivers/gpu/ion/Makefile | 1 +
drivers/gpu/ion/ion_cma_heap.c | 221 ++++++++++++++++++++++++++++++++++++++++
drivers/gpu/ion/ion_heap.c | 7 ++
drivers/gpu/ion/ion_priv.h | 16 +++
include/linux/ion.h | 5 +-
5 files changed, 249 insertions(+), 1 deletion(-)
create mode 100644 drivers/gpu/ion/ion_cma_heap.c
--
1.7.10
Hello,
Here is my initial proposal for device tree integration for Contiguous
Memory Allocator. The code is quite straightforward, however I expect
that the memory bindings require some discussion.
The proposed bindings allows to define contiguous memory regions of
specified base address and size. Then, the defined regions can be
assigned to the given device(s) by adding a property with a phanle to
the defined contiguous memory region. From the device tree perspective
that's all. Once the bindings are added, all the memory allocations from
dma-mapping subsystem will be served from the defined contiguous memory
regions.
Contiguous Memory Allocator is a framework, which lets to provide a
large contiguous memory buffers for (usually a multimedia) devices. The
contiguous memory is reserved during early boot and then shared with
kernel, which is allowed to allocate it for movable pages. Then, when
device driver requests a contigouous buffer, the framework migrates
movable pages out of contiguous region and gives it to the driver. When
device driver frees the buffer, it is added to kernel memory pool again.
For more information, please refer to commit c64be2bb1c6eb43c838b2c6d57
("drivers: add Contiguous Memory Allocator") and d484864dd96e1830e76895
(CMA merge commit).
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (2):
drivers: dma-contiguous: clean source code and prepare for device
tree
drivers: dma-contiguous: add initialization from device tree
Documentation/devicetree/bindings/memory.txt | 101 ++++++++++
arch/arm/boot/dts/skeleton.dtsi | 7 +-
drivers/base/dma-contiguous.c | 278 +++++++++++++++++++-------
include/asm-generic/dma-contiguous.h | 4 +-
include/linux/dma-contiguous.h | 32 ++-
5 files changed, 338 insertions(+), 84 deletions(-)
create mode 100644 Documentation/devicetree/bindings/memory.txt
--
1.7.9.5
This is to let you know that the migration of lists.linaro.org has been
successfully completed.
As per the email I sent on Wednesday, it may take some time for the new
address of the server to be seen by your computer. You can check this by
trying to connect to the web site:
http://lists.linaro.org/
If you are able to connect and you do not get an error, this means you are
connecting to the new server and you can send email to the lists.
If you experience any problems after the weekend and you find that you
still cannot connect to the server, please reply to this email to let us
know.
Regards
Philip
IT Services Manager
Linaro
Hello
You are receiving this email because you are subscribed to one or more
mailing lists provided by the lists.linaro.org server.
IT Services are announcing planned maintenance for this server scheduled
for *Friday 15th March 2013, starting at 2pm GMT*. The purpose of the work
is to move the service to another server. There will be some disruption
during this maintenance.
In order to ensure that you do not accidentally try to use the service
while it is being moved, the current server will be shut down at 2pm.
A further email will be sent on Friday afternoon to confirm that the
migration of the service is completed. However, due to the way servers are
found, it may take a while before your computer is able to connect to the
relocated service.
After the old server has been shut down, email sent to any of the lists
will be queued, but it is possible that the sending server will still
trying to deliver the email to the old server rather than the new one when
it is started.
It is therefore *strongly* recommended that you do not send any email to an
@lists.linaro.org email address until you can connect to the new service,
which you will be able to test by trying to use a web browser to connect to
http://lists.linaro.org after you receive the email confirming that the
migration has been completed. Since the old service will be shut down, if
you are able to connect, you can be sure you have connected to the new
service.
If by Monday you are still unable to connect to the service or you are not
able to send email to an @lists.linaro.org email address, please send an
email to its(a)linaro.org.
Thank you.
Regards
Philip
IT Services Manager
Linaro
Atomic pool should always be allocated from DMA zone if such zone is
available in the system to avoid issues caused by limited dma mask of
any of the devices used for making an atomic allocation.
Reported-by: Krzysztof Halasa <khc(a)pm.waw.pl>
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
---
arch/arm/mm/dma-mapping.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c7e3759..e9db6b4 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -342,6 +342,7 @@ static int __init atomic_pool_init(void)
{
struct dma_pool *pool = &atomic_pool;
pgprot_t prot = pgprot_dmacoherent(pgprot_kernel);
+ gfp_t gfp = GFP_KERNEL | GFP_DMA;
unsigned long nr_pages = pool->size >> PAGE_SHIFT;
unsigned long *bitmap;
struct page *page;
@@ -361,8 +362,8 @@ static int __init atomic_pool_init(void)
ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page,
atomic_pool_init);
else
- ptr = __alloc_remap_buffer(NULL, pool->size, GFP_KERNEL, prot,
- &page, atomic_pool_init);
+ ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page,
+ atomic_pool_init);
if (ptr) {
int i;
--
1.7.9.5
As proposed yesterday, here's the Android sync driver patches for
staging.
I've preserved the commit history, but moved all the changes over
to be against the staging directory (instead of drivers/base).
The goal of submitting this driver to staging is to try to get
more collaberation, as there are some similar efforts going on
in the community with dmabuf-fences. My email from yesterday with
more details for how I hope this goes is here:
http://comments.gmane.org/gmane.linux.kernel/1448420
Erik also provided a nice background on the patch set in his
reply yesterday, which I'll quote here:
"In Honeycomb where we introduced the Hardware Composer HAL. This is a
userspace layer that allows composition acceleration on a per platform
basis. Different SoC vendors have implemented this using overlays, 2d
blitters, a combinations of both, or other clever/disgusting means.
Along with the HWC we consolidated a lot of our camera and media
pipeline to allow their input to be fed into the GPU or
display(overlay.) In order to exploit parallelism the the graphics
pipeline, this introduced lots of implicit synchronization
dependancies. After a couple years of working with many different SoC
vendors, we found that it was really difficult to communicate our
system's expectations of the implicit contract and it was difficult
for the SoC vendors to properly implement the implicit contract in
each of their IP blocks (display, gpu, camera, video codecs). It was
also incredibly difficult to debug when problems/deadlocks arose.
In an effort to clean up the situation we decided to create set of
simple synchronization primitives and have our compositor
(SurfaceFlinger) manage the synchronization contract explicitly. We
designed these primitives so that they can be passed across processes
(much like ion/dma_buf handles), can be backed by hardware
synchronization primitives, and can be combined with other sync
dependancies in a heterogeneous manner. We also added enough
debugging information to make pinpointing a synchronization deadlock
bug easier. There are also OpenGL extensions added (which I believe
have been ratified by Khronos) to convert a "native" sync object to a
gl fence object and vise versa.
So far shipped this system on two products (the Nexus 10 and 4) with
two different SoCs (Samsung Exynos5250 and Qualcomm MSM8064.) These
two projects were much easier to work out the kinks in the
graphics/compositing pipelines. In addition we were able to use the
telemetry and tracing features to track down the causes of dropped
frames aka "jank."
As for the implementation, I started with having the main driver op
primitive be a wait() op. I quickly noticed that most of the tricky
race condition prone code was ending up in the drivers wait() op. It
also made handling asynchronous waits of more than one type of sync_pt
difficult to manage. In the end I opted for something roughly like
poll() where all the heavy lifting is done at the high level and the
drivers only need to implement a simple check function."
Anyway, let me know what you think of the patches, and hopefully
this is something that could be considered for staging for 3.10
thanks
-john
Cc: Maarten Lankhorst <maarten.lankhorst(a)canonical.com>
Cc: Erik Gilling <konkers(a)android.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Rob Clark <robclark(a)gmail.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: Android Kernel Team <kernel-team(a)android.com>
Erik Gilling (26):
staging: sync: Add synchronization framework
staging: sw_sync: Add cpu based sync driver
staging: sync: Add timestamps to sync_pts
staging: sync: Add debugfs support
staging: sw_sync: Add debug support
staging: sync: Add ioctl to get fence data
staging: sw_sync: Add fill_driver_data support
staging: sync: Add poll support
staging: sync: Allow async waits to be canceled
staging: sync: Export sync API symbols
staging: sw_sync: Export sw_sync API
staging: sync: Reorder sync_fence_release
staging: sync: Optimize fence merges
staging: sync: Add internal refcounting to fences
staging: sync: Add reference counting to timelines
staging: sync: Change wait timeout to mirror poll semantics
staging: sync: Dump sync state to console on timeout
staging: sync: Improve timeout dump messages
staging: sync: Dump sync state on fence errors
staging: sync: Protect unlocked access to fence status
staging: sync: Update new fence status with sync_fence_signal_pt
staging: sync: Use proper barriers when waiting indefinitely
staging: sync: Refactor sync debug printing
staging: sw_sync: Convert to use new value_str debug ops
staging: sync: Add tracepoint support
staging: sync: Don't log wait timeouts when timeout = 0
Jamie Gennis (1):
staging: sync: Fix timeout = 0 wait behavior
Rebecca Schultz Zavin (2):
staging: sync: Fix error paths
staging: sw_sync: Fix error paths
Ørjan Eide (1):
staging: sync: Fix race condition between merge and signal
drivers/staging/android/Kconfig | 27 +
drivers/staging/android/Makefile | 2 +
drivers/staging/android/sw_sync.c | 263 +++++++++
drivers/staging/android/sw_sync.h | 58 ++
drivers/staging/android/sync.c | 1016 ++++++++++++++++++++++++++++++++++
drivers/staging/android/sync.h | 426 ++++++++++++++
drivers/staging/android/trace/sync.h | 82 +++
7 files changed, 1874 insertions(+)
create mode 100644 drivers/staging/android/sw_sync.c
create mode 100644 drivers/staging/android/sw_sync.h
create mode 100644 drivers/staging/android/sync.c
create mode 100644 drivers/staging/android/sync.h
create mode 100644 drivers/staging/android/trace/sync.h
--
1.7.10.4
Hi everybody,
Here's a summary of the CDF BoF that took place at the ELC 2013.
I'd like to start by thanking all the participants who provided valuable
feedback (and those who didn't, but who now know a bit more about CDF and
will, I have no doubt about that, contribute in the future :-)). Thank you
also to Linus Walleij and Jesse Barker for taking notes during the meeting
while I was presenting. And obviously, thank you to Jesse Barker for
organizing the BoF.
I've tried to be as accurate as possible in this summary, but I might have
made mistakes. If you have attended the meeting, please point out any issue,
inconsistency, or just points I might have forgotten.
----
As not all attendees were familiar with CDF I started by briefly introducing
the problems that prompted me to start working on CDF.
CDF started as GPF, the Generic Panel Framework. While working on DT support
for a display controller driver I realized that panel control code was located
in board file. Moving the code somewhere in drivers/ was thus a prerequisite,
but it turned out that no framework existed in the kernel to support that
tasks. Several major display controller drivers (TI DSS and Samsung Exynos to
name a few) had a platform-specific panel driver framework, but the resulting
panel drivers wouldn't be reusable across different display controllers. A
need for a new framework became pretty evident to me.
After drafting an initial proposal and discussing it with several people
online and offline (in Helsinki with Tomi Valkeinen from TI, in Copenhagen at
Linaro Connect with Marcus Lorentzon from ST-Ericsson, and in Brussels during
a BoF at the FOSDEM) the need to support encoders in addition to panels
quickly arose, and GPF turned into CDF.
I then pursued with an overview of the latest CDF code and its key concepts.
While I was expecting this to be a short overview followed by more in-depth
discussions, it turned out to support our discussions for the whole 2 hours
meeting.
The latest available version at the time of the BoF (posted to the linaro-mm-
sig mailing list in reply to the BoF's annoucement) was the "non-quite-v3"
version. It incorporated feedback received on v2 but hadn't been properly
tested yet.
The basic CDF building block is called a display entity, modeled as an
instance of struct display_entity. They have sink ports through which they
receive video data and/or source ports through which they transmit video data.
Entities are chained via their ports to create a display pipeline.
>From the outside world entities are interfaced through two sets of abstract
operations they must provide:
- Control operations are called from "upper layers" (usually to implement
userspace requests) to get and set entity parameters (such as the physical
size, video modes, operation states, bus parameters, ...). Those operations
are implemented at the entity level.
Google asked how partial updates were handled, I answered they're not handled
yet (this is a key concept behind the CDF RFCs: while I try to make sure all
devices can be supported, I first concentrate on hardware features required
for the devices I work on). Linus Walleij mentioned he thought that partial
updates were becoming out of fashion, but larger display sizes might keep them
useful in the future.
- Video operations control video streams. They're implemented by entities on
their source ports, and are called in the upstream (from a video pipeline
point of view) direction. A panel will call video operations of the entity it
gets its video stream from (this could be an HDMI transmitter, the display
controller directly, ...) to control the video stream it receives.
Video operations are split in a set of common operations and sets of display
bus specific operations (for DPI, DBI, DSI, ...). Some discussion around ops
that might be needed in some cases but not others indicate that the ops
structures are not quite finished for all bus types (and/or that some ops
might be considered for "promotion" to common). In particular the current DSI
implementation is copied from a proposal posted by Tomasz Figa from Samsung.
As I have no DSI hardware to test it on I have kept it as-is.
Jesse Barker pointed out that to make this fly we willl need to get CDF into a
number of implementations, in particular the Samsung Exynos SoCs (needing
DSI). Several efforts are ongoing:
- Marcus Lorentzon (ST Ericsson, Linaro) is working on porting ST Ericsson
code to CDF, and in particular on the DSI interface.
- Tomasz Figa (Samsung) has worked on porting the Exynos display controller
driver to CDF and provided a DSI implementation.
- Tomi Valkeinen (TI) is working on porting the TI DSS driver to CDF (or
rather his own version of CDF as a first step, to avoid depending on an ever-
moving target right now) independently from Linaro.
- Alison Chaiken (Mentor Embedded Software) mentioned that Pengutronix is
working on panels support for the Freescale i.MX family.
- Linaro can probably also help extending the test coverage to various
platforms from its member companies.
- Finally, I'm working on CDF support for two display controllers found in
Renesas SoCs. One of them support DBI and DPI, the other supports DPI only.
However, I can't easily test DBI support, as I don't have access to the
necessary hardware.
I explained at that point that there is currently no clear agreement on a bus
and operations model. The initial CDF proposal created a Linux busses for DBI
and DSI (similar to I2C and SPI busses), with access to the control bus
implemented through those Linux busses, and access to the video bus
implemented through video operations on display entities. Tomi Valkeinen then
advocated for getting rid of the DBI and DSI Linux busses and implementing
access to both control and video through the display entity operations, while
Marcus Lorentzon wanted to implement all those operations at the Linux bus
level instead. The best way to arbitrate this will probably to work on several
implementations and find out which one works better.
SONY Mobile currently supports DSI auto-probing, with plug-n-play detection of
DSI panels. The panel ID is first retrieved, and the correct panel driver is
then loaded. We will likely need to support a similar model. Another option
would be to write a single panel-dcs driver to support all DSI panels that
conform with the DSI and DCS standards (although we will very likely need
panel-specific quirks in that case). The two options could also coexist.
We then moved to how display entities should be handled by KMS drivers and
mapped to KMS objects. The KMS model hardcodes the following fixed pipeline
CRTC -> encoder -> connector
The CRTC is controlled by the display controller driver, and panels can be
mapped to KMS connector objects. What goes in-between is more of a gray area,
as hardware pipeline can have several encoders chained together.
I've presented one possible control flow that could solution the problem by
grouping multiple objects into an abstract entity. The right-most entity would
be a standalone entity, and every encoder but the left-most one in the chain
would hide the entities connected at their output. This results in a "russian
dolls" model, where encoders forward control operations to the entities they
embed, and forward video operations to the entity at their sink side.
This can quickly become very complex, especially when locking and reference
counting are added to the model. Furthermore, this solution could only handle
linear pipelines, which will likely become a severe limitation in the future,
especially on embedded devices (for instance splitting a video stream between
two panels at the encoder level is a common use case, or driving a two-inputs
panel from two CRTCs).
Google asked whether this model tries to address both panels and
VGA(/HDMI/...) outputs. From what I've seen so far the only limits come from
the hardware engineers (often^H^H^H^H^Hsometimes troubled) minds, all kinds of
data streams may appear in practice. As most systems will have one CRTS, one
encoder and one panel (or connector), we should probably try to keep the model
simple to start with with 1:1 mappings between the KMS CRTC/encoder/connector
model and the CDF model. If we try to solve every possible problem right now
the complexity will explode and we won't be able to handle it. Getting a
simple solution upstream now and refactoring it later (there is no userspace
API involved, so no backward compatibility issue) might be the right answer. I
have no strong feeling about it, but I certainly want something I can get
upstream in a reasonable time frame.
Keith Packard bluntly (and totally rightfully) whether CDF is not just
duplicating part of the KMS API, and whether we shouldn't instead extend the
in-kernel KMS model to handle multiple encoders.
One reason that drove the creation of CDF outside of KMS was to support
sharing a single driver between multiple subsystems. For instance an HDMI
encoder could be connected to the output of a display controller handled by a
KMS driver on one board, and to the output of a video processor handled by a
V4L2 driver on another board. A panel could also be connected to a display
controller handled by a KMS driver on one board, and to a display controller
handled by an FBDEV driver on another board. Having a single driver for those
encoders or panels is one of the goals of CDF.
After publishing the first CDF RFC I realized there was a global consensus in
the kernel display community to deprecate FBDEV at some point. Sharing panel
drivers between KMS and FBDEV then became a "nice to have, but not important"
feature. As V4L2 doesn't handle panels (and shouldn't be extended to do so)
only encoders drivers would need to be shared, between KMS and V4L2.
It's important to note here that we don't need to share a given encoder
between two subsystems at runtime. On a given board the encoder will need to
be controlled by KMS or V4L2, but never both at the same time. In the CDF
context driver sharing refers to the ability to control a given driver from
either the KMS or V4L2 subsystem.
The discussion then moved to why V4L2 drivers for devices connected to an
encoder couldn't be moved to KMS. All display devices should be handled by
KMS, but we still have use cases where V4L2 need to handle video outputs. For
instance a system with the following pipeline
HDMI con. -> HDMI RX -> Processing Engine -> HDMI TX -> HDMI con.
doesn't involve memory buffers in the processing pipeline. This can't be
handled by KMS, as KMS cannot reporesent a video pipeline without memory in-
between the receiving side and the display side. Hans Verkuil also mentioned
that for certain applications one prefers to center the API around frames, and
that V4L2 is ideal for instance for video conferencing/telephony.
Keith Packard thought we should just extend KMS to handle the V4L2 use cases.
V4L2 would then (somehow) plug its infrastructure into KMS. This topic has
already been discussed in the past, and I agree that extending the KMS model
to support "live sources" for CRTCs will be needed in the near future. This
could be the basis of other KMS enhancements to support more complex
pipelines. Making KMS and V4L2 cooperate is also desirable on the display side
to write the output of the CRTC back to memory. KMS has no write-back feature
in the API, V4L2 could come to the rescue there.
With this kind of extension it might be possible to handle the display part of
memory-less pipelines in KMS, although that might be quite a challenge. There
was no clear consensus on whether this was desirable.
Furthermore, only two HDMI encoders currently need to be shared (both are only
supported out-of-tree at the moment). As we don't expect more than a handful
of such use cases in the near future, it might not be worth the hasle to
create a complete infrastructure to handle a use case that might disappear if
we later move all the display-side drivers to KMS.
Another solution mentioned by Hans Verkuil would be to create helper functions
to translate V4L2 calls to KMS calls (to be clear, this only covers in-kernel
calls to encoders).
There was no clear consensus on this topic.
We then moved on to the hot-plug (and hot-unplug) issues following a question
from Google. Hot-plug is currently not supported. We would need to add hot-
plugging notifiers and possibly a couple of other operations. However, the
video common operations structure has bind/unbind operations, that can serve
as a basis.
The hard part in hot-plugging support is actually hot-unplugging, as we need
to ensure that devices don't disappear all of a sudden while still in use.
This was a design goal of CDF from the start, and any issue there will need to
be resolved. Panels shouldn't be handled differently than HDMI connectors, CDF
will provide a common hot-plugging model.
Keith Packard then explained that DRM and KMS will likely be split in the
future. The main link between the DRM and KMS APIs is GEM objects. With the
recent addition of dmabuf to the Linux kernel the DRM and KMS APIs could be
split and use dmabuf to share buffers. DRM and KMS would then be exposed on
two separate device nodes. It would be a good idea to revisit the whole
KMS/V4L2 unification discussion when DRM and KMS will be split.
We briefly touched the subject of namespaces, and whether CDF should use the
KMS namespace (drm_*). There is some resistance on the V4L2 side on having CDF
structures be KMS objects.
It was then time to wrap up the meeting, and I asked the audience one final
question: should we shoehorn complex pipelines into the KMS three-stages
model, or should we extend the KMS model? That was unfortunately answered by
silence, showing that more thinking is needed.
A couple more minutes of offline discussions briefly touched the topics of GPU
driver reverse engineering and whether we could, after the KMS/DRM split, set
a kernel-side standard for embedded GPU drivers. As interesting as this topic
is, CDF will not solve that problem :-)
--
Regards,
Laurent Pinchart
I'd like to get a discussion going about submitting the Android sync
driver to staging.
I know there is currently some very similar work going on with the
dmabuf-fences, and rather then both approaches being worked out
individually on their own, I suspect there could be better collaboration
around this effort.
So my proposal is that we merge the Android sync driver into staging.
In my mind, this has the following benefits:
1) It allows other drivers that depend on the sync interface to also be
submitted to staging, rather then forcing those drivers to be hidden
away in various out of tree git repos, location unknown.
2) It would provide a baseline view to the upstream community of the
interface Android is using, providing a real-world, active use case of
the functionality.
Once the sync driver is in staging, if the dmabuf-fences work is fully
sufficient to replace the Android sync driver, we should be able to
whittle down the sync driver until its just a interface shim (and at
which point efforts can be made to convert Android userland over to
dmabuf-fences).
However, if the dmabuf-fences work is not fully sufficient to replace
the android sync driver, we should be able to at least to whittle down
the driver to those specific differences, which would provide a concrete
example of where the dmabuf-fences, or other work may need to be
expanded, or if maybe the sync driver is the better approach.
I've gone through the Android tree and reworked the sync driver to live
in staging, while still preserving the full patch history/authorship.
You can checkout the reworked patch queue here:
http://git.linaro.org/gitweb?p=people/jstultz/android-dev.git;a=shortlog;h=…
If folks would take a look and let me know what they think of the
changes as well as what they think about pushing it to staging, or other
ideas for how to improve collaboration so we can have common interfaces
here, I'd appreciate it.
Also note: I've done this so far without any feedback from the Android
devs (despite my reaching out to Erik a few times recently), so if they
object to pushing it to staging, in deference to it being their code
I'll back off, even though I do think it would be good to have the code
get more visibility upstream in staging. I don't mean to step on
anyone's toes. :)
thanks
-john