op 07-11-13 22:11, Rom Lemarchand schreef:
> Hi Maarten, I tested your changes and needed the attached patch: behavior
> now seems equivalent as android sync. I haven't tested performance.
>
> The issue resolved by this patch happens when i_b < b->num_fences and i_a
>> = a->num_fences (or vice versa). Then, pt_a is invalid and so
> dereferencing pt_a->context causes a crash.
>
Yeah, I pushed my original fix. I intended to keep android userspace behavior the same, and I tried to keep the kernelspace the api same as much as I could. If peformance is the same, or not noticeably worse, would there be any objections on the android side about renaming dma-fence to syncpoint, and getting it in mainline?
~Maarten
op 07-11-13 22:11, Rom Lemarchand schreef:
> Hi Maarten, I tested your changes and needed the attached patch: behavior
> now seems equivalent as android sync. I haven't tested performance.
>
> The issue resolved by this patch happens when i_b < b->num_fences and
> i_a >= a->num_fences (or vice versa). Then, pt_a is invalid and so
> dereferencing pt_a->context causes a crash.
Oops, thinko. :) Originally I had it correct by doing this:
+ /*
+ * Assume sync_fence a and b are both ordered and have no
+ * duplicates with the same context.
+ *
+ * If a sync_fence can only be created with sync_fence_merge
+ * and sync_fence_create, this is a reasonable assumption.
+ */
+ for (i = i_a = i_b = 0; i_a < a->num_fences && i_b < b->num_fences; ) {
+ struct fence *pt_a = a->cbs[i_a].sync_pt;
+ struct fence *pt_b = b->cbs[i_b].sync_pt;
+
+ if (pt_a->context < pt_b->context) {
+ sync_fence_add_pt(fence, &i, pt_a);
+
+ i_a++;
+ } else if (pt_a->context > pt_b->context) {
+ sync_fence_add_pt(fence, &i, pt_b);
+
+ i_b++;
+ } else {
+ if (pt_a->seqno - pt_b->seqno <= INT_MAX)
+ sync_fence_add_pt(fence, &i, pt_a);
+ else
+ sync_fence_add_pt(fence, &i, pt_b);
+
+ i_a++;
+ i_b++;
+ }
+ }
+
+ /* Add remaining fences from a or b*/
+ for (; i_a < a->num_fences; i_a++)
+ sync_fence_add_pt(fence, &i, a->cbs[i_a].sync_pt);
+
+ for (; i_b < b->num_fences; i_b++)
+ sync_fence_add_pt(fence, &i, b->cbs[i_b].sync_pt);
Then I thought I could clean it up by merging it, but that ended up being
more unreadable and crashing... so I guess I'll revert back to this version. :)
Anyone else but me that feels such a function could be useful?
My main use-case is that it would resolve the mutual refcounting problem:
1) drm buffer object caches a dma_buf pointer which it refcounts
2) The dma-buf holds a refcount to the buffer.
This is resolved today by having the user-space visible part of the
drm-buffer holding the refcount to the dma_buf. When user-space closes
the drm-buffer, the reference goes away, and eventually the buffer is
freed, when all external dma-buf users are done with the dma-buf
However, this also means that the dma-buf remains for the buffer
lifetime even when there are no external users, which bugs me a bit.
This can be resolved by viewing the drm buffer as a lookup structure
that doesn't hold a refcount to the dma-buf, but that means that the
lookup structure (buffer) would need to share locks with the dma-buf
implementation, unless we have a get_dma_buf_unless_zero, which means we
can use locks local to the lookup structure, the drm buffer.
(See the last part of the kref documentation for a detailed discussion
of this).
Now I don't think keeping the dma_buf for the drm buffer lifetime is a
HUGE problem, but I just wanted to get people's views of this.
Thanks,
Thomas
Hi!
I'm just looking over what's needed to implement drm Prime / dma-buf
exports + imports in the vmwgfx driver. It seems like most dma-bufs ops
are quite straightforward to implement except user-space mmap().
The reason being that vmwgfx dma-bufs will be using completely
non-coherent memory, whenever there needs to be CPU accesses.
The accelerated contents resides in an opaque structure on the device
into which we can DMA to and from, so for mmap to work we need to zap
ptes and DMA to the device when doing something accelerated, and on the
first page-fault DMA data back and wait for idle if the device did a
write to the dma-buf.
Now this shouldn't really be a problem if dma-bufs were only used for
cross-device sharing, but since people apparently want to use dma-buf
file handles to share CPU data between processes it really becomes a
serious problem.
Needless to say we'd want to limit the size of the DMAs, and have mmap
users request regions for read, and mark regions dirty for write,
something similar to gallium's texture transfer objects.
Any ideas?
/Thomas
Hey,
So I took a look at the sync stuff in android, in a lot of ways I believe that they're similar, yet subtly different.
Most of the stuff I looked at is from the sync.h header in drivers/staging, so maybe my knowledge is incomplete.
The timeline is similar to what I called a fence context. Each command stream on a gpu can have a context. Because
nvidia hardware can have 4095 separate timelines, I didn't want to keep the bookkeeping for each timeline, although
I guess that it's already done. Maybe it could be done in a unified way for each driver, making a transition to
timelines that can be used by android easier.
I did not have an explicit syncpoint addition, but I think that sync points + sync_fence were similar to what I did with
my dma-fence stuff, except slightly different.
In my approach the dma-fence is signaled after all sync_points are done AND the queued commands are executed.
In effect the dma-fence becomes the next syncpoint, depending on all previous dma-fence syncpoints.
An important thing to note is that dma-fence is kernelspace only, so it might be better to rename it to syncpoint,
and use fence for the userspace interface.
A big difference is locking, I assume in my code that most fences emitted are not waited on, so the fastpath
fence_signal is a test_and_set_bit plus test_bit. A single lock is used for the waitqueue and callbacks,
with the waitqueue being implemented internally as an asynchronous callback. The lock is provided by the driver,
which makes adding support for old hardware that has no reliable way of notifying completion of events easier.
I avoided using global locks, but I think for debugfs support I may end up having to add some.
The dma fence looks similar overall, except that I allow overriding some stuff and keep less track about state.
I do believe that I can create a userspace interface around dma_fence that works similar to android, and the
kernel space interface could be done in a similar way too.
One thing though: is it really required to merge fences? It seems to me that if I add a poll callback userspace
could simply do a poll on a list of fences. This would give userspace all the information it needs about each
individual fence.
The thing about wait/wound mutexes can be ignored for this discussion. It's really just a method of adding a
fence to a dma-buf, and building a list of all dma-fences to wait on in the kernel before starting a command
buffer, and setting a new fence to all the dma-bufs to signal completion of the event. Regardless of the sync
mechanism we'll decide on, this stuff wouldn't change.
Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
syncpoints to dma-fence, which I'll probably rename to syncpoints.
~Maarten
Hi!
Considering that the linux DMA-API states that information in an sg-list
may be destroyed when it's mapped,
it seems to me that at least one of the drm prime functions use invalid
assumptions.
In particular, I don't think it's safe to assume that pages in a single
sg-list segment are contigous after mapping, so
if we want struct page pointers we should use
pfn = dma_to_phys((sg_dma_address(sg) + p_offset*PAGE_SIZE)) >> PAGE_SHIFT
and if the pfn is valid, convert it to a struct page.
(Incorrect code is, for example, in drm_prime_sg_to_page_addr_arrays)
Or does dma-buf require that page info in sg-lists need to be kept
across the map operation?
BTW this brings up another question: It's stated that the above function
is needed by the TTM driver in order to do
correct fault handling. This seems odd, TTM shouldn't be able to mmap()
or fault an imported dma-buf, right?
Thanks,
Thomas
On Thu, 2013-10-17 at 13:37 -0500, Matt Sealey wrote:
> This may be late, but please can you consider re-using the CHRP
> reserved node (i.e. device_type = "reserved")?
>
> Since it does exactly the same thing, is well defined since the dark ages?
>
> It's CHRP 1.7 section 5.9 by the way (just before /chosen gets defined).
>
> It would solve a selection of the issues; and require zero binding
> work except describing perhaps a couple freakish Linux-specific
> properties that may be only as intrusive as, say, linux,initrd would
> be in /chosen.
>
> The most effective, multi-OS way of using it ("available" property not
> currently implemented in Linux for some reason, but it could come in
> so handy - not only because it matches the way Linux resource
> structures are handled)
First, the original /reserved on CHRP was supposed to be about reserved
bus space for things like hidden HW devices, but yes, it could be used for
that. However it would be nice to enrich the binding to provide at least
some kind of specific identification of what a given reserved area is about,
see my comments about that in the previous threads.
The available property is of no use to us. It purely indicates what is
available while OFW is still running. Once we get rid of OFW its content
is utterly meaningless.
The original OFW was design with the idea that OFW remains alive along
with the operating system, and that has been done on some platforms, but
that idea has been ditched very early on in powerpc space for many
reasons, one of them being that most implementations of OFW around were
way to broken to bother.
> memory@0x70000000 {
> device_type = "memory";
> reg = <0x70000000 0x40000000>;
> available = <0x70000000 0x10000000
> 0x90000000 0x1ffc00000>; /* top 16KiB of memory
> is where the secure firmware keeps it's mailboxes */
> };
> freaky-codec-memory: reserved@0x80000000 {
> device_type = "reserved";
> reg = <0x80000000 0x10000000>;
> available = <0x80000000 0x8000000
> 0x88000000 0x8000000>; /* two 128MiB buffers */
> non-objectionable-mark-as-contiguous-property-name-here;
> cacheable;
> };
>
> Any driver that has, previously, required a bunch of it's own memory
> carved out of DDR *should* be gaining a phandle reference to that
> reserved node however it likes (it would be up to that devices'
> binding).
>
> On Linux under CMA, it may well be ignored and just stuffed into the
> generic CMA regions, and the driver MAY allocate anywhere it likes,
> but it COULD ask for memory based on a region phandle or, horribly, by
> name (since there's no other way to search for it, the OF "name" for
> reserved SHALL be "reserved") and be given memory in that region
> defined by the reserved node if it had any addressing restrictions.
>
> /videodecoder@0x43f01000 {
> compatible = "freaky,codec";
> :
> decode-buffer = &freaky-codec-memory;
> };
>
> On another OS it may manually map and use a custom allocator for that
> memory region, since otherwise the OS would not have even looked at
> it.
>
> Also this discussion of Jeremy Kerr's proposal seems to be 'missing'
> on Google. Do you happen to have a link to it?
>
> Thanks,
> Matt Sealey <neko(a)bakuhatsu.net>
Hello,
I thought that v6 series is really ready for merging, but I found today
a stupid bug which broke compilation when CONFIG_CMA was not enabled, so
another respin is needed. I've also took the opportunity and fixed
some minor issues in the documentation (renamed all nodes to 'region')
and rebased the patches onto Aneesh Kumar K.V patch queued earlier for
merging.
For more information, please refer to the V6 patches thread:
http://thread.gmane.org/gmane.linux.ports.arm.kernel/261518
Best regards
Marek Szyprowski
Samsung R&D Institute Poland
Changelog:
v7:
- fixed compilation break when CMA support has been disabled
- rebased onto Aneesh Kumar K.V CMA patches queued earlier for merging
- more cleanups in the binding documentation (thanks to Stephen Warren)
v6: http://thread.gmane.org/gmane.linux.ports.arm.kernel/261518
- added support for 'status' property, so memory regions can be disabled
like any other nodes
- fixed issues pointed by Rob: removed annotations from function
declarations and replaced macros with static inline functions.
- restored of_scan_flat_dt_by_path() function to simplify reserved memory
scanning function
- the code now uses #size-cells/#address-cells properties from root node
for interpreting 'reg' property in reserved memory regions
- fixed some issues in dt binding documentation
v5: http://thread.gmane.org/gmane.linux.ports.arm.kernel/259278
- renamed "contiguous-memory-region" compatibility string to
"linux,contiguous-memory-region" (this one is really specific to Linux
kernel implementation)
- renamed "dma-memory-region" property to "memory-region" (suggested by
Kumar Gala)
- added support for #address-cells, #size-cells for memory regions
(thanks to Rob Herring for suggestion)
- removed generic code to scan specific path in flat device tree (cannot
be used to fdt one-pass scan based initialization of memory regions with
#address-cells and #size-cells parsing)
- replaced dma_contiguous_set_default_area() and dma_contiguous_add_device()
functions with dev_set_cma_area() call
v4: http://thread.gmane.org/gmane.linux.ports.arm.kernel/256491
- corrected Devcie Tree mailing list address (resend)
- moved back contiguous-memory bindings from /chosen/contiguous-memory
to /memory nodes as suggested by Grant (see
http://article.gmane.org/gmane.linux.drivers.devicetree/41030
for more details)
- added support for DMA reserved memory with dma_declare_coherent()
- moved code to drivers/of/of_reserved_mem.c
- added generic code to scan specific path in flat device tree
v3: http://thread.gmane.org/gmane.linux.drivers.devicetree/40013/
- fixed issues pointed by Laura and updated documentation
v2: http://thread.gmane.org/gmane.linux.drivers.devicetree/34075
- moved contiguous-memory bindings from /memory to /chosen/contiguous-memory/
node to avoid spreading Linux specific parameters over the whole device
tree definitions
- added support for autoconfigured regions (use zero base)
- fixes minor bugs
v1: http://thread.gmane.org/gmane.linux.drivers.devicetree/30111/
- initial proposal
Patch summary:
Marek Szyprowski (4):
drivers: dma-contiguous: clean source code and prepare for device
tree
drivers: of: add function to scan fdt nodes given by path
drivers: of: add initialization code for dma reserved memory
ARM: init: add support for reserved memory defined by device tree
Documentation/devicetree/bindings/memory.txt | 168 +++++++++++++++++++++++++
arch/arm/include/asm/dma-contiguous.h | 1 -
arch/arm/mm/init.c | 3 +
arch/x86/include/asm/dma-contiguous.h | 1 -
drivers/base/dma-contiguous.c | 119 +++++++-----------
drivers/of/Kconfig | 6 +
drivers/of/Makefile | 1 +
drivers/of/fdt.c | 76 +++++++++++
drivers/of/of_reserved_mem.c | 175 ++++++++++++++++++++++++++
drivers/of/platform.c | 4 +
include/asm-generic/dma-contiguous.h | 28 -----
include/linux/device.h | 2 +-
include/linux/dma-contiguous.h | 62 ++++++++-
include/linux/of_fdt.h | 3 +
include/linux/of_reserved_mem.h | 14 +++
15 files changed, 555 insertions(+), 108 deletions(-)
create mode 100644 Documentation/devicetree/bindings/memory.txt
create mode 100644 drivers/of/of_reserved_mem.c
delete mode 100644 include/asm-generic/dma-contiguous.h
create mode 100644 include/linux/of_reserved_mem.h
--
1.7.9.5