Am 12.06.19 um 10:15 schrieb Nicolin Chen:
> Hi Christian,
>
> On Wed, Jun 12, 2019 at 08:05:53AM +0000, Koenig, Christian wrote:
>> Am 12.06.19 um 10:02 schrieb Nicolin Chen:
>> [SNIP]
>>> We haven't used DRM/GRM_PRIME yet but I am also curious would it
>>> benefit DRM also if we reduce this overhead in the dma_buf?
>> No, not at all.
> From you replies, in a summary, does it means that there won't be a case
> of DRM having a dma_buf attaching to the same device, i.e. multiple calls
> of drm_gem_prime_import() function with same parameters of dev + dma_buf?
Well, there are some cases where this happens. But in those cases we
intentionally want to get a new attachment :)
So thinking more about it you would actually break those and that is not
something we can do.
> If so, we can just ignore/drop this patch. Sorry for the misunderstanding.
It might be interesting for things like P2P, but even then it might be
better to just cache the P2P settings instead of the full attachment.
Regards,
Christian.
>
> Thanks
> Nicolin
Am 12.06.19 um 10:02 schrieb Nicolin Chen:
> Hi Christian,
>
> Thanks for the quick reply.
>
> On Wed, Jun 12, 2019 at 07:45:38AM +0000, Koenig, Christian wrote:
>> Am 12.06.19 um 03:22 schrieb Nicolin Chen:
>>> Commit f13e143e7444 ("dma-buf: start caching of sg_table objects v2")
>>> added a support of caching the sgt pointer into an attach pointer to
>>> let users reuse the sgt pointer without another mapping. However, it
>>> might not totally work as most of dma-buf callers are doing attach()
>>> and map_attachment() back-to-back, using drm_prime.c for example:
>>> drm_gem_prime_import_dev() {
>>> attach = dma_buf_attach() {
>>> /* Allocating a new attach */
>>> attach = kzalloc();
>>> /* .... */
>>> return attach;
>>> }
>>> dma_buf_map_attachment(attach, direction) {
>>> /* attach->sgt would be always empty as attach is new */
>>> if (attach->sgt) {
>>> /* Reuse attach->sgt */
>>> }
>>> /* Otherwise, map it */
>>> attach->sgt = map();
>>> }
>>> }
>>>
>>> So, for a cache_sgt_mapping use case, it would need to get the same
>>> attachment pointer in order to reuse its sgt pointer. So this patch
>>> adds a refcount to the attach() function and lets it search for the
>>> existing attach pointer by matching the dev pointer.
>> I don't think that this is a good idea.
>>
>> We use sgt caching as workaround for locking order problems and want to
>> remove it again in the long term.
> Oh. I thought it was for a performance improving purpose. It may
> be a misunderstanding then.
>
>> So what is the actual use case of this?
> We have some similar downstream changes at dma_buf to reduce the
> overhead from multiple clients of the same device doing attach()
> and map_attachment() calls for the same dma_buf.
I don't think that this is a good idea over all. A driver calling attach
for the same buffer is doing something wrong in the first place and we
should not work around this in the DMA-buf handling.
> We haven't used DRM/GRM_PRIME yet but I am also curious would it
> benefit DRM also if we reduce this overhead in the dma_buf?
No, not at all.
Regards,
Christian.
>
> Thanks
> Nicolin
Am 12.06.19 um 03:22 schrieb Nicolin Chen:
> Commit f13e143e7444 ("dma-buf: start caching of sg_table objects v2")
> added a support of caching the sgt pointer into an attach pointer to
> let users reuse the sgt pointer without another mapping. However, it
> might not totally work as most of dma-buf callers are doing attach()
> and map_attachment() back-to-back, using drm_prime.c for example:
> drm_gem_prime_import_dev() {
> attach = dma_buf_attach() {
> /* Allocating a new attach */
> attach = kzalloc();
> /* .... */
> return attach;
> }
> dma_buf_map_attachment(attach, direction) {
> /* attach->sgt would be always empty as attach is new */
> if (attach->sgt) {
> /* Reuse attach->sgt */
> }
> /* Otherwise, map it */
> attach->sgt = map();
> }
> }
>
> So, for a cache_sgt_mapping use case, it would need to get the same
> attachment pointer in order to reuse its sgt pointer. So this patch
> adds a refcount to the attach() function and lets it search for the
> existing attach pointer by matching the dev pointer.
I don't think that this is a good idea.
We use sgt caching as workaround for locking order problems and want to
remove it again in the long term.
So what is the actual use case of this?
Regards,
Christian.
>
> Signed-off-by: Nicolin Chen <nicoleotsuka(a)gmail.com>
> ---
> drivers/dma-buf/dma-buf.c | 23 +++++++++++++++++++++++
> include/linux/dma-buf.h | 2 ++
> 2 files changed, 25 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index f4104a21b069..d0260553a31c 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -559,6 +559,21 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> if (WARN_ON(!dmabuf || !dev))
> return ERR_PTR(-EINVAL);
>
> + /* cache_sgt_mapping requires to reuse the same attachment pointer */
> + if (dmabuf->ops->cache_sgt_mapping) {
> + mutex_lock(&dmabuf->lock);
> +
> + /* Search for existing attachment and increase its refcount */
> + list_for_each_entry(attach, &dmabuf->attachments, node) {
> + if (dev != attach->dev)
> + continue;
> + atomic_inc_not_zero(&attach->refcount);
> + goto unlock_attach;
> + }
> +
> + mutex_unlock(&dmabuf->lock);
> + }
> +
> attach = kzalloc(sizeof(*attach), GFP_KERNEL);
> if (!attach)
> return ERR_PTR(-ENOMEM);
> @@ -575,6 +590,9 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> }
> list_add(&attach->node, &dmabuf->attachments);
>
> + atomic_set(&attach->refcount, 1);
> +
> +unlock_attach:
> mutex_unlock(&dmabuf->lock);
>
> return attach;
> @@ -599,6 +617,11 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach)
> if (WARN_ON(!dmabuf || !attach))
> return;
>
> + /* Decrease the refcount for cache_sgt_mapping use cases */
> + if (dmabuf->ops->cache_sgt_mapping &&
> + atomic_dec_return(&attach->refcount))
> + return;
> +
> if (attach->sgt)
> dmabuf->ops->unmap_dma_buf(attach, attach->sgt, attach->dir);
>
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 8a327566d7f4..65f12212ca2e 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -333,6 +333,7 @@ struct dma_buf {
> * @dev: device attached to the buffer.
> * @node: list of dma_buf_attachment.
> * @sgt: cached mapping.
> + * @refcount: refcount of the attachment for the same device.
> * @dir: direction of cached mapping.
> * @priv: exporter specific attachment data.
> *
> @@ -350,6 +351,7 @@ struct dma_buf_attachment {
> struct device *dev;
> struct list_head node;
> struct sg_table *sgt;
> + atomic_t refcount;
> enum dma_data_direction dir;
> void *priv;
> };
On Mon, 27 May 2019 18:56:20 +0800 Christian Koenig wrote:
> Thanks for the comments, but you are looking at a completely outdated patchset.
>
> If you are interested in the newest one please ping me and I'm going to CC you
> when I send out the next version.
>
Ping...
Thanks
Hillf
Hi everybody,
core idea in this patch set is that DMA-buf importers can now provide an optional invalidate callback. Using this callback and the reservation object exporters can now avoid pinning DMA-buf memory for a long time while sharing it between devices.
I've already send out an older version roughly a year ago, but didn't had time to further look into cleaning this up.
The last time a major problem was that we would had to fix up all drivers implementing DMA-buf at once.
Now I avoid this by allowing mappings to be cached in the DMA-buf attachment and so driver can optionally move over to the new interface one by one.
This is also a prerequisite to my patchset enabling sharing of device memory with DMA-buf.
Please review and/or comment,
Christian.
Quoting Michael Yang (2019-05-14 08:55:37)
> On Thu, May 09, 2019 at 12:46:05PM +0100, Chris Wilson wrote:
> > Quoting Michael Yang (2019-05-09 05:34:11)
> > > If all the sync points were signaled in both fences a and b,
> > > there was only one sync point in merged fence which is a_fence[0].
> > > The Fence structure in android framework might be confused about
> > > timestamp if there were any sync points which were signaled after
> > > a_fence[0]. It might be more reasonable to use timestamp of last signaled
> > > sync point to represent the merged fence.
> > > The issue can be found from EGL extension ANDROID_get_frame_timestamps.
> > > Sometimes the return value of EGL_READS_DONE_TIME_ANDROID is head of
> > > the return value of EGL_RENDERING_COMPLETE_TIME_ANDROID.
> > > That means display/composition had been completed before rendering
> > > was completed that is incorrect.
> > >
> > > Some discussion can be found at:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__android-2Dreview.googl…
> > >
> > > Signed-off-by: Michael Yang <michael.yang(a)imgtec.com>
> > > ---
> > > Hi,
> > > I didn't get response since I previously sent this a month ago.
> > > Could someone have a chance to look at it please?
> > > Thanks.
> > > drivers/dma-buf/sync_file.c | 25 +++++++++++++++++++++++--
> > > 1 file changed, 23 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > index 4f6305c..d46bfe1 100644
> > > --- a/drivers/dma-buf/sync_file.c
> > > +++ b/drivers/dma-buf/sync_file.c
> > > @@ -274,8 +274,29 @@ static struct sync_file *sync_file_merge(const char *name, struct sync_file *a,
> > > for (; i_b < b_num_fences; i_b++)
> > > add_fence(fences, &i, b_fences[i_b]);
> > >
> > > - if (i == 0)
> > > - fences[i++] = dma_fence_get(a_fences[0]);
> > > + /* If all the sync pts were signaled, then adding the sync_pt who
> > > + * was the last signaled to the fence.
> > > + */
> > > + if (i == 0) {
> > > + struct dma_fence *last_signaled_sync_pt = a_fences[0];
> > > + int iter;
> > > +
> > > + for (iter = 1; iter < a_num_fences; iter++) {
> >
> > If there is more than one fence, sync_file->fence is a fence_array and
> > its timestamp is what you want. If there is one fence, sync_file->fence
> > is a pointer to that fence, and naturally has the right timestamp.
> >
> > In short, this should be handled by dma_fence_array_create() when given
> > a complete set of signaled fences, it too should inherit the signaled
> > status with the timestamp being taken from the last fence. It should
> > also be careful to inherit the error status.
> > -Chris
> Thanks Chris for the inputs. For this case, there will be only one fence
> in sync_file->fence after doing sync_file_merge(). Regarding to the current
> implementation, dma_fence_array_create() is not called as num_fences is equal
> to 1. I was wondering do you suggest that we pass a complete set of signaled
> fences to sync_file_set_fence() and handle it in dma_fence_array_create().
> Thanks.
No, in the case there is only one fence, we just inherit its timestamp
along with its fence status. (A single fence is the degenerate case of
a fence array.)
-Chris
Quoting Michael Yang (2019-05-09 05:34:11)
> If all the sync points were signaled in both fences a and b,
> there was only one sync point in merged fence which is a_fence[0].
> The Fence structure in android framework might be confused about
> timestamp if there were any sync points which were signaled after
> a_fence[0]. It might be more reasonable to use timestamp of last signaled
> sync point to represent the merged fence.
> The issue can be found from EGL extension ANDROID_get_frame_timestamps.
> Sometimes the return value of EGL_READS_DONE_TIME_ANDROID is head of
> the return value of EGL_RENDERING_COMPLETE_TIME_ANDROID.
> That means display/composition had been completed before rendering
> was completed that is incorrect.
>
> Some discussion can be found at:
> https://android-review.googlesource.com/c/kernel/common/+/907009
>
> Signed-off-by: Michael Yang <michael.yang(a)imgtec.com>
> ---
> Hi,
> I didn't get response since I previously sent this a month ago.
> Could someone have a chance to look at it please?
> Thanks.
> drivers/dma-buf/sync_file.c | 25 +++++++++++++++++++++++--
> 1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> index 4f6305c..d46bfe1 100644
> --- a/drivers/dma-buf/sync_file.c
> +++ b/drivers/dma-buf/sync_file.c
> @@ -274,8 +274,29 @@ static struct sync_file *sync_file_merge(const char *name, struct sync_file *a,
> for (; i_b < b_num_fences; i_b++)
> add_fence(fences, &i, b_fences[i_b]);
>
> - if (i == 0)
> - fences[i++] = dma_fence_get(a_fences[0]);
> + /* If all the sync pts were signaled, then adding the sync_pt who
> + * was the last signaled to the fence.
> + */
> + if (i == 0) {
> + struct dma_fence *last_signaled_sync_pt = a_fences[0];
> + int iter;
> +
> + for (iter = 1; iter < a_num_fences; iter++) {
If there is more than one fence, sync_file->fence is a fence_array and
its timestamp is what you want. If there is one fence, sync_file->fence
is a pointer to that fence, and naturally has the right timestamp.
In short, this should be handled by dma_fence_array_create() when given
a complete set of signaled fences, it too should inherit the signaled
status with the timestamp being taken from the last fence. It should
also be careful to inherit the error status.
-Chris
On Mon, Apr 22, 2019 at 08:49:27PM +0200, Oscar Gomez Fuente wrote:
> These changes solve warning symbol was not declared in the functions:
> ion_carveout_heap_create and ion_chunk_heap_create
>
> Signed-off-by: Oscar Gomez Fuente <oscargomezf(a)gmail.com>
> ---
> drivers/staging/android/ion/ion_carveout_heap.c | 2 +-
> drivers/staging/android/ion/ion_chunk_heap.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/staging/android/ion/ion_carveout_heap.c b/drivers/staging/android/ion/ion_carveout_heap.c
> index bb9d614..3f359ae 100644
> --- a/drivers/staging/android/ion/ion_carveout_heap.c
> +++ b/drivers/staging/android/ion/ion_carveout_heap.c
> @@ -103,7 +103,7 @@ static struct ion_heap_ops carveout_heap_ops = {
> .unmap_kernel = ion_heap_unmap_kernel,
> };
>
> -struct ion_heap *ion_carveout_heap_create(phys_addr_t base, size_t size)
> +static inline struct ion_heap *ion_carveout_heap_create(phys_addr_t base, size_t size)
Why are you making it inline? Btw, normally we just leave it for the
compiler to choose which functions to make inline.
regards,
dan carpenter
On top of those I have 6 more patches in the pipeline to enable VRAM P2P
with DMA-buf.
So that is not the end of the patch set :)
Christian.
Am 17.04.19 um 15:52 schrieb Chunming Zhou:
> Thanks Christian, great job. I will verify it this week when I finish my
> current work on hand.
>
> -David
>
> 在 2019/4/17 2:38, Christian König wrote:
>> Hi everybody,
>>
>> core idea in this patch set is that DMA-buf importers can now provide an optional invalidate callback. Using this callback and the reservation object exporters can now avoid pinning DMA-buf memory for a long time while sharing it between devices.
>>
>> I've already send out an older version roughly a year ago, but didn't had time to further look into cleaning this up.
>>
>> The last time a major problem was that we would had to fix up all drivers implementing DMA-buf at once.
>>
>> Now I avoid this by allowing mappings to be cached in the DMA-buf attachment and so driver can optionally move over to the new interface one by one.
>>
>> This is also a prerequisite to my patchset enabling sharing of device memory with DMA-buf.
>>
>> Please review and/or comment,
>> Christian.
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel(a)lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On 3/29/19 7:26 PM, Zengtao (B) wrote:
> Hi laura:
>
>> -----Original Message-----
>> From: Laura Abbott [mailto:labbott@redhat.com]
>> Sent: Friday, March 29, 2019 9:27 PM
>> To: Zengtao (B) <prime.zeng(a)hisilicon.com>; sumit.semwal(a)linaro.org
>> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>; Arve Hjønnevåg
>> <arve(a)android.com>; Todd Kjos <tkjos(a)android.com>; Martijn Coenen
>> <maco(a)android.com>; Joel Fernandes <joel(a)joelfernandes.org>;
>> Christian Brauner <christian(a)brauner.io>; devel(a)driverdev.osuosl.org;
>> dri-devel(a)lists.freedesktop.org; linaro-mm-sig(a)lists.linaro.org;
>> linux-kernel(a)vger.kernel.org
>> Subject: Re: [PATCH] staging: android: ion: refactory ion_alloc for kernel
>> driver use
>>
>> On 3/29/19 11:40 AM, Zeng Tao wrote:
>>> There are two reasons for this patch:
>>> 1. There are some potential requirements for ion_alloc in kernel
>>> space, some media drivers need to allocate media buffers from ion
>>> instead of buddy or dma framework, this is more convient and clean
>>> very for media drivers. And In that case, ion is the only media buffer
>>> provider, it's more easier to maintain.
>>> 2. Fd is only needed by user processes, not the kernel space, so
>>> dma_buf should be returned instead of fd for kernel space, and
>>> dma_buf_fd should be called only for userspace api.
>>>
>>
>> I really want to just NAK this because it doesn't seem like something
>> that's necessary. The purpose of Ion is to provide buffers to userspace
>> because there's no other way for userspace to get access to the memory.
>> The kernel already has other APIs to access the memory. This also
>> complicates the re-work that's been happening where the requirement is
>> only userspace.
>>
>> Can you be more detailed about which media drivers you are referring to
>> and why they can't just use other APIs?
>>
>
> I think I 've got your point, the ION is designed for usespace, but for kernel
> space, we are really lacking of someone which plays the same role,(allocate
> media memory, share the memory using dma_buf, provide debug and statistics
> for media memory).
>
> In fact, for kernel space, we have the dma framework, dma-buf, etc..
> And we can work on top of such apis, but some duplicate jobs(everyone has
> to maintain its own buffer sharing, debug and statistics).
> So we need to have some to do the common things(ION's the best choice now)
>
Keep in mind that Ion is a thin shell of what it was as most of the
debugging and statistics was removed because it was buggy. Most of that
should end up going at the dma_buf layer since it's really a dma_buf allocation
API.
> When the ION was introduced, a lot of media memory frameworks existed, the
> dma framework was not so good, so ION heaps, integrated buffer sharing, statistics
> and usespace api were the required features, but now dma framework is more powerful,
> we don't even need ION heaps now, but the userspace api, buffer sharing, statistics are
> still needed, and the buffer sharing, statistics can be re-worked and export to kernel space,
> not only used by userspace, , and that is my point.
>
I see what you are getting at but I don't think the same thing
applies to the kernel as it does userspace. We can enforce a
single way of using the dma_buf fd in userspace but the kernel
has a variety of ways to use dma_buf because each driver and
framework has its own needs. I'm still not convinced that adding
Ion APIs in the kernel is the right option since as you point out
we don't really need the heaps. That mostly leaves Ion as a wrapper
to handle doing the export. Maybe we could benefit from that
but I think it might require more thought.
I'd rather see a proposal in the media API itself showing what
you think is necessary but without using Ion. That would be
a good start so we could fully review what might make sense to
pull out of Ion into something common.
Thanks,
Laura
>>
>>> Signed-off-by: Zeng Tao <prime.zeng(a)hisilicon.com>
>>> ---
>>> drivers/staging/android/ion/ion.c | 32
>> +++++++++++++++++---------------
>>> 1 file changed, 17 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/staging/android/ion/ion.c
>>> b/drivers/staging/android/ion/ion.c
>>> index 92c2914..e93fb49 100644
>>> --- a/drivers/staging/android/ion/ion.c
>>> +++ b/drivers/staging/android/ion/ion.c
>>> @@ -387,13 +387,13 @@ static const struct dma_buf_ops
>> dma_buf_ops = {
>>> .unmap = ion_dma_buf_kunmap,
>>> };
>>>
>>> -static int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned
>>> int flags)
>>> +struct dma_buf *ion_alloc(size_t len, unsigned int heap_id_mask,
>>> + unsigned int flags)
>>> {
>>> struct ion_device *dev = internal_dev;
>>> struct ion_buffer *buffer = NULL;
>>> struct ion_heap *heap;
>>> DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
>>> - int fd;
>>> struct dma_buf *dmabuf;
>>>
>>> pr_debug("%s: len %zu heap_id_mask %u flags %x\n", __func__,
>> @@
>>> -407,7 +407,7 @@ static int ion_alloc(size_t len, unsigned int
>> heap_id_mask, unsigned int flags)
>>> len = PAGE_ALIGN(len);
>>>
>>> if (!len)
>>> - return -EINVAL;
>>> + return ERR_PTR(-EINVAL);
>>>
>>> down_read(&dev->lock);
>>> plist_for_each_entry(heap, &dev->heaps, node) { @@ -421,10
>> +421,10
>>> @@ static int ion_alloc(size_t len, unsigned int heap_id_mask,
>> unsigned int flags)
>>> up_read(&dev->lock);
>>>
>>> if (!buffer)
>>> - return -ENODEV;
>>> + return ERR_PTR(-ENODEV);
>>>
>>> if (IS_ERR(buffer))
>>> - return PTR_ERR(buffer);
>>> + return ERR_PTR(PTR_ERR(buffer));
>>>
>>> exp_info.ops = &dma_buf_ops;
>>> exp_info.size = buffer->size;
>>> @@ -432,17 +432,12 @@ static int ion_alloc(size_t len, unsigned int
>> heap_id_mask, unsigned int flags)
>>> exp_info.priv = buffer;
>>>
>>> dmabuf = dma_buf_export(&exp_info);
>>> - if (IS_ERR(dmabuf)) {
>>> + if (IS_ERR(dmabuf))
>>> _ion_buffer_destroy(buffer);
>>> - return PTR_ERR(dmabuf);
>>> - }
>>>
>>> - fd = dma_buf_fd(dmabuf, O_CLOEXEC);
>>> - if (fd < 0)
>>> - dma_buf_put(dmabuf);
>>> -
>>> - return fd;
>>> + return dmabuf;
>>> }
>>> +EXPORT_SYMBOL(ion_alloc);
>>>
>>> static int ion_query_heaps(struct ion_heap_query *query)
>>> {
>>> @@ -539,12 +534,19 @@ static long ion_ioctl(struct file *filp, unsigned
>> int cmd, unsigned long arg)
>>> case ION_IOC_ALLOC:
>>> {
>>> int fd;
>>> + struct dma_buf *dmabuf;
>>>
>>> - fd = ion_alloc(data.allocation.len,
>>> + dmabuf = ion_alloc(data.allocation.len,
>>> data.allocation.heap_id_mask,
>>> data.allocation.flags);
>>> - if (fd < 0)
>>> + if (IS_ERR(dmabuf))
>>> + return PTR_ERR(dmabuf);
>>> +
>>> + fd = dma_buf_fd(dmabuf, O_CLOEXEC);
>>> + if (fd < 0) {
>>> + dma_buf_put(dmabuf);
>>> return fd;
>>> + }
>>>
>>> data.allocation.fd = fd;
>>>
>>>
>