On Tue, Feb 26, 2019 at 11:20 PM Hyun Kwon <hyun.kwon(a)xilinx.com> wrote:
>
> Hi Daniel,
>
> Thanks for the comment.
>
> On Tue, 2019-02-26 at 04:06:13 -0800, Daniel Vetter wrote:
> > On Tue, Feb 26, 2019 at 12:53 PM Greg Kroah-Hartman
> > <gregkh(a)linuxfoundation.org> wrote:
> > >
> > > On Sat, Feb 23, 2019 at 12:28:17PM -0800, Hyun Kwon wrote:
> > > > Add the dmabuf map / unmap interfaces. This allows the user driver
> > > > to be able to import the external dmabuf and use it from user space.
> > > >
> > > > Signed-off-by: Hyun Kwon <hyun.kwon(a)xilinx.com>
> > > > ---
> > > > drivers/uio/Makefile | 2 +-
> > > > drivers/uio/uio.c | 43 +++++++++
> > > > drivers/uio/uio_dmabuf.c | 210 +++++++++++++++++++++++++++++++++++++++++++
> > > > drivers/uio/uio_dmabuf.h | 26 ++++++
> > > > include/uapi/linux/uio/uio.h | 33 +++++++
> > > > 5 files changed, 313 insertions(+), 1 deletion(-)
> > > > create mode 100644 drivers/uio/uio_dmabuf.c
> > > > create mode 100644 drivers/uio/uio_dmabuf.h
> > > > create mode 100644 include/uapi/linux/uio/uio.h
> > > >
> > > > diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> > > > index c285dd2..5da16c7 100644
> > > > --- a/drivers/uio/Makefile
> > > > +++ b/drivers/uio/Makefile
> > > > @@ -1,5 +1,5 @@
> > > > # SPDX-License-Identifier: GPL-2.0
> > > > -obj-$(CONFIG_UIO) += uio.o
> > > > +obj-$(CONFIG_UIO) += uio.o uio_dmabuf.o
> > > > obj-$(CONFIG_UIO_CIF) += uio_cif.o
> > > > obj-$(CONFIG_UIO_PDRV_GENIRQ) += uio_pdrv_genirq.o
> > > > obj-$(CONFIG_UIO_DMEM_GENIRQ) += uio_dmem_genirq.o
> > > > diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
> > > > index 1313422..6841f98 100644
> > > > --- a/drivers/uio/uio.c
> > > > +++ b/drivers/uio/uio.c
> > > > @@ -24,6 +24,12 @@
> > > > #include <linux/kobject.h>
> > > > #include <linux/cdev.h>
> > > > #include <linux/uio_driver.h>
> > > > +#include <linux/list.h>
> > > > +#include <linux/mutex.h>
> > > > +
> > > > +#include <uapi/linux/uio/uio.h>
> > > > +
> > > > +#include "uio_dmabuf.h"
> > > >
> > > > #define UIO_MAX_DEVICES (1U << MINORBITS)
> > > >
> > > > @@ -454,6 +460,8 @@ static irqreturn_t uio_interrupt(int irq, void *dev_id)
> > > > struct uio_listener {
> > > > struct uio_device *dev;
> > > > s32 event_count;
> > > > + struct list_head dbufs;
> > > > + struct mutex dbufs_lock; /* protect @dbufs */
> > > > };
> > > >
> > > > static int uio_open(struct inode *inode, struct file *filep)
> > > > @@ -500,6 +508,9 @@ static int uio_open(struct inode *inode, struct file *filep)
> > > > if (ret)
> > > > goto err_infoopen;
> > > >
> > > > + INIT_LIST_HEAD(&listener->dbufs);
> > > > + mutex_init(&listener->dbufs_lock);
> > > > +
> > > > return 0;
> > > >
> > > > err_infoopen:
> > > > @@ -529,6 +540,10 @@ static int uio_release(struct inode *inode, struct file *filep)
> > > > struct uio_listener *listener = filep->private_data;
> > > > struct uio_device *idev = listener->dev;
> > > >
> > > > + ret = uio_dmabuf_cleanup(idev, &listener->dbufs, &listener->dbufs_lock);
> > > > + if (ret)
> > > > + dev_err(&idev->dev, "failed to clean up the dma bufs\n");
> > > > +
> > > > mutex_lock(&idev->info_lock);
> > > > if (idev->info && idev->info->release)
> > > > ret = idev->info->release(idev->info, inode);
> > > > @@ -652,6 +667,33 @@ static ssize_t uio_write(struct file *filep, const char __user *buf,
> > > > return retval ? retval : sizeof(s32);
> > > > }
> > > >
> > > > +static long uio_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> > >
> > > We have resisted adding a uio ioctl for a long time, can't you do this
> > > through sysfs somehow?
> > >
> > > A meta-comment about your ioctl structure:
> > >
> > > > +#define UIO_DMABUF_DIR_BIDIR 1
> > > > +#define UIO_DMABUF_DIR_TO_DEV 2
> > > > +#define UIO_DMABUF_DIR_FROM_DEV 3
> > > > +#define UIO_DMABUF_DIR_NONE 4
> > >
> > > enumerated type?
> > >
> > > > +
> > > > +struct uio_dmabuf_args {
> > > > + __s32 dbuf_fd;
> > > > + __u64 dma_addr;
> > > > + __u64 size;
> > > > + __u32 dir;
> > >
> > > Why the odd alignment? Are you sure this is the best packing for such a
> > > structure?
> > >
> > > Why is dbuf_fd __s32? dir can be __u8, right?
> > >
> > > I don't know that dma layer very well, it would be good to get some
> > > review from others to see if this really is even a viable thing to do.
> > > The fd handling seems a bit "odd" here, but maybe I just do not
> > > understand it.
> >
> > Frankly looks like a ploy to sidestep review by graphics folks. We'd
> > ask for the userspace first :-)
>
> Please refer to pull request [1].
>
> For any interest in more details, the libmetal is the abstraction layer
> which provides platform independent APIs. The backend implementation
> can be selected per different platforms: ex, rtos, linux,
> standalone (xilinx),,,. For Linux, it supports UIO / vfio as of now.
> The actual user space drivers sit on top of libmetal. Such drivers can be
> found in [2]. This is why I try to avoid any device specific code in
> Linux kernel.
>
> >
> > Also, exporting dma_addr to userspace is considered a very bad idea.
>
> I agree, hence the RFC to pick some brains. :-) Would it make sense
> if this call doesn't export the physicall address, but instead takes
> only the dmabuf fd and register offsets to be programmed?
>
> > If you want to do this properly, you need a minimal in-kernel memory
> > manager, and those tend to be based on top of drm_gem.c and merged
> > through the gpu tree. The last place where we accidentally leaked a
> > dma addr for gpu buffers was in the fbdev code, and we plugged that
> > one with
>
> Could you please help me understand how having a in-kernel memory manager
> helps? Isn't it just moving same dmabuf import / paddr export functionality
> in different modules: kernel memory manager vs uio. In fact, Xilinx does have
> such memory manager based on drm gem in downstream. But for this time we took
> the approach of implementing this through generic dmabuf allocator, ION, and
> enabling the import capability in the UIO infrastructure instead.
There's a group of people working on upstreaming a xilinx drm driver
already. Which driver are we talking about? Can you pls provide a link
to that xilinx drm driver?
Thanks, Daniel
> Thanks,
> -hyun
>
> [1] https://github.com/OpenAMP/libmetal/pull/82/commits/951e2762bd487c98919ad12…
> [2] https://github.com/Xilinx/embeddedsw/tree/master/XilinxProcessorIPLib/drive…
>
> >
> > commit 4be9bd10e22dfc7fc101c5cf5969ef2d3a042d8a (tag:
> > drm-misc-next-fixes-2018-10-03)
> > Author: Neil Armstrong <narmstrong(a)baylibre.com>
> > Date: Fri Sep 28 14:05:55 2018 +0200
> >
> > drm/fb_helper: Allow leaking fbdev smem_start
> >
> > Together with cuse the above patch should be enough to implement a drm
> > driver entirely in userspace at least.
> >
> > Cheers, Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Wed, Feb 13, 2019 at 04:01:46PM -0800, Hyun Kwon wrote:
> Add "WITH Linux-syscall-note" to the license to not put GPL
> restrictions on user space programs using this header.
>
> Signed-off-by: Hyun Kwon <hyun.kwon(a)xilinx.com>
> ---
> drivers/staging/android/uapi/ion.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/android/uapi/ion.h b/drivers/staging/android/uapi/ion.h
> index 5d70098..46c93fc 100644
> --- a/drivers/staging/android/uapi/ion.h
> +++ b/drivers/staging/android/uapi/ion.h
> @@ -1,4 +1,4 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> /*
> * drivers/staging/android/uapi/ion.h
> *
> --
> 2.7.4
>
Yes, that is the correct thing to do, let me go queue this up.
thanks,
greg k-h
On 2/11/19 11:09 PM, Jing Xia wrote:
> gfp_flags is always set high_order_gfp_flags even if allocations of
> order 0 are made.But for smaller allocations, the system should be able
> to reclaim some memory.
>
> Signed-off-by: Jing Xia <jing.xia(a)unisoc.com>
> Reviewed-by: Yuming Han <yuming.han(a)unisoc.com>
> Reviewed-by: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
> Reviewed-by: Orson Zhai <orson.zhai(a)unisoc.com>
> ---
> drivers/staging/android/ion/ion_system_heap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c
> index 0383f75..20f2103 100644
> --- a/drivers/staging/android/ion/ion_system_heap.c
> +++ b/drivers/staging/android/ion/ion_system_heap.c
> @@ -223,10 +223,10 @@ static void ion_system_heap_destroy_pools(struct ion_page_pool **pools)
> static int ion_system_heap_create_pools(struct ion_page_pool **pools)
> {
> int i;
> - gfp_t gfp_flags = low_order_gfp_flags;
>
> for (i = 0; i < NUM_ORDERS; i++) {
> struct ion_page_pool *pool;
> + gfp_t gfp_flags = low_order_gfp_flags;
>
> if (orders[i] > 4)
> gfp_flags = high_order_gfp_flags;
>
This was already submitted in
https://lore.kernel.org/lkml/1549004386-38778-1-git-send-email-saberlily.xi…
(I'm also very behind on Ion e-mail and need to catch up...)
Laura
On Fri, Feb 01, 2019 at 02:59:46PM +0800, Qing Xia wrote:
> In the first loop, gfp_flags will be modified to high_order_gfp_flags,
> and there will be no chance to change back to low_order_gfp_flags.
>
> Fixes: e7f63771 ("ION: Sys_heap: Add cached pool to spead up cached buffer alloc")
Huh... Presumably you found this bug just by reading the code. I
wonder how it affects runtime?
regards,
dan carpenter
On Tue, 22 Jan 2019, Andrew F. Davis wrote:
> On 1/21/19 4:12 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >
> >> On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
> >>> The main use case is for allowing clients to pass in
> >>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance
> >>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In
> >>> ION the buffers aren't usually accessed from the CPU so this allows
> >>> clients to often avoid doing unnecessary cache maintenance.
> >>
> >> This can't work. The cpu can still easily speculate into this area.
> >
> > Can you provide more detail on your concern here.
> > The use case I am thinking about here is a cached buffer which is accessed
> > by a non IO-coherent device (quite a common use case for ION).
> >
> > Guessing on your concern:
> > The speculative access can be an issue if you are going to access the
> > buffer from the CPU after the device has written to it, however if you
> > know you aren't going to do any CPU access before the buffer is again
> > returned to the device then I don't think the speculative access is a
> > concern.
> >
> >> Moreover in general these operations should be cheap if the addresses
> >> aren't cached.
> >>
> >
> > I am thinking of use cases with cached buffers here, so CMO isn't cheap.
> >
>
> These buffers are cacheable, not cached, if you haven't written anything
> the data wont actually be in cache.
That's true
> And in the case of speculative cache
> filling the lines are marked clean. In either case the only cost is the
> little 7 instruction loop calling the clean/invalidate instruction (dc
> civac for ARMv8) for the cache-lines. Unless that is the cost you are
> trying to avoid?
>
This is the cost I am trying to avoid and this comes back to our previous
discussion. We have a coherent system cache so if you are doing this for
every cache line on a large buffer it adds up with this work and the going
to the bus.
For example I believe 1080P buffers are 8MB, and 4K buffers are even
larger.
I also still think you would want to solve this properly such that
invalidates aren't being done unnecessarily.
> In that case if you are mapping and unmapping so much that the little
> CMO here is hurting performance then I would argue your usage is broken
> and needs to be re-worked a bit.
>
I am not sure I would say it is broken, the large buffers (example 1080P
buffers) are mapped and unmapped on every frame. I don't think there is
any clean way to avoid that in a pipelining framework, you could ask
clients to keep the buffers dma mapped but there isn't necessarily a good
time to tell them to unmap.
It would be unfortunate to not consider this something legitimate for
usespace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there
isn't necessarily a nice place to tell them when to detach.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Tue, 22 Jan 2019, Andrew F. Davis wrote:
> On 1/21/19 4:18 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/21/19 2:20 PM, Liam Mark wrote:
> >>> On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/21/19 1:44 PM, Liam Mark wrote:
> >>>>> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >>>>>
> >>>>>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> >>>>>>>> And who is going to decide which ones to pass? And who documents
> >>>>>>>> which ones are safe?
> >>>>>>>>
> >>>>>>>> I'd much rather have explicit, well documented dma-buf flags that
> >>>>>>>> might get translated to the DMA API flags, which are not error checked,
> >>>>>>>> not very well documented and way to easy to get wrong.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I'm not sure having flags in dma-buf really solves anything
> >>>>>>> given drivers can use the attributes directly with dma_map
> >>>>>>> anyway, which is what we're looking to do. The intention
> >>>>>>> is for the driver creating the dma_buf attachment to have
> >>>>>>> the knowledge of which flags to use.
> >>>>>>
> >>>>>> Well, there are very few flags that you can simply use for all calls of
> >>>>>> dma_map*. And given how badly these flags are defined I just don't want
> >>>>>> people to add more places where they indirectly use these flags, as
> >>>>>> it will be more than enough work to clean up the current mess.
> >>>>>>
> >>>>>> What flag(s) do you want to pass this way, btw? Maybe that is where
> >>>>>> the problem is.
> >>>>>>
> >>>>>
> >>>>> The main use case is for allowing clients to pass in
> >>>>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance
> >>>>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In
> >>>>> ION the buffers aren't usually accessed from the CPU so this allows
> >>>>> clients to often avoid doing unnecessary cache maintenance.
> >>>>>
> >>>>
> >>>> How can a client know that no CPU access has occurred that needs to be
> >>>> flushed out?
> >>>>
> >>>
> >>> I have left this to clients, but if they own the buffer they can have the
> >>> knowledge as to whether CPU access is needed in that use case (example for
> >>> post-processing).
> >>>
> >>> For example with the previous version of ION we left all decisions of
> >>> whether cache maintenance was required up to the client, they would use
> >>> the ION cache maintenance IOCTL to force cache maintenance only when it
> >>> was required.
> >>> In these cases almost all of the access was being done by the device and
> >>> in the rare cases CPU access was required clients would initiate the
> >>> required cache maintenance before and after the CPU access.
> >>>
> >>
> >> I think we have different definitions of "client", I'm talking about the
> >> DMA-BUF client (the importer), that is who can set this flag. It seems
> >> you mean the userspace application, which has no control over this flag.
> >>
> >
> > I am also talking about dma-buf clients, I am referring to both the
> > userspace and kernel component of the client. For example our Camera ION
> > client has both a usersapce and kernel component and they have ION
> > buffers, which they control the access to, which may or may not be
> > accessed by the CPU in certain uses cases.
> >
>
> I know they often work together, but for this discussion it would be
> good to keep kernel clients and usperspace clients separate. There are
> three types of actors at play here, userspace clients, kernel clients,
> and exporters.
>
> DMA-BUF only provides the basic sync primitive + mmap directly to
> userspace,
Well dma-buf does provide dma_buf_kmap/dma_buf_begin_cpu_access which
allows the same fucntionality in the kernel, but I don't think that changes
your argument.
> both operations are fulfilled by the exporter. This patch is
> about adding more control to the kernel side clients. The kernel side
> clients cannot know what userspace or other kernel side clients have
> done with the buffer, *only* the exporter has the whole picture.
>
> Therefor neither type of client should be deciding if the CPU needs
> flushed or not, only the exporter, based on the type of buffer, the
> current set attachments, and previous actions (is this first attachment,
> CPU get access in-between, etc...) can make this decision.
>
> You goal seems to be to avoid unneeded CPU side CMOs when a device
> detaches and another attaches with no CPU access in-between, right?
> That's reasonable to me, but it must be the exporter who keeps track and
> skips the CMO. This patch allows the client to tell the exporter the CMO
> is not needed and that is not safe.
>
I agree it would be better have this logic in the exporter, but I just
haven't heard an upstreamable way to make that work.
But maybe to explore that a bit more.
If we consider having CPU access with no devices attached a legitimate use
case:
The pipelining use case I am thinking of is
1) dev 1 attach, map, access, unmap
2) dev 1 detach
3) (maybe) CPU access
4) dev 2 attach
5) dev 2 map, access
6) ...
It would be unfortunate to not consider this something legitimate for
userspace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there
isn't necessarily a nice place to tell them when to detach.
If we considered the above a supported use case I think we could support
it in dma-buf (based on past discussions) if we had 2 things
#1 if we tracked the state of the buffer (example if it has had a previous
cached/uncached write and no following CMO). Then when either the CPU or
a device was going to access a buffer it could decide, based on the
previous access if any CMO needs to be applied first.
#2 we had a non-architecture specific way to apply cache maintenance
without a device, so that in step #3 the begin_cpu_acess call could
successfully invalidate the buffer.
I think #1 is doable since we can tell tell if devices are IO coherent or
not and we know the direction of accesses in dma map and begin cpu access.
I think we would probably agree that #2 is a problem though, getting the
kernel to expose that API seems like a hard argument.
Liam
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Some stability changes to improve ION robustness and a perf related
change to make it easier for clients to avoid unnecessary cache
maintenance, such as when buffers are clean and haven't had any CPU
access.
Liam Mark (4):
staging: android: ion: Support cpu access during dma_buf_detach
staging: android: ion: Restrict cache maintenance to dma mapped memory
dma-buf: add support for mapping with dma mapping attributes
staging: android: ion: Support for mapping with dma mapping attributes
drivers/staging/android/ion/ion.c | 33 +++++++++++++++++++++++++--------
include/linux/dma-buf.h | 3 +++
2 files changed, 28 insertions(+), 8 deletions(-)
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project