- Linaro-mm-sig - lists.linaro.org

Re: [Linaro-mm-sig] [RFC] Helper to abstract vma handling in media layer

by Hans Verkuil

On 04/11/2014 12:18 AM, Jan Kara wrote: > On Thu 10-04-14 23:57:38, Jan Kara wrote: >> On Thu 10-04-14 14:22:20, Hans Verkuil wrote: >>> On 04/10/14 14:15, Jan Kara wrote: >>>> On Thu 10-04-14 13:07:42, Hans Verkuil wrote: >>>>> On 04/10/14 12:32, Jan Kara wrote: >>>>>> Hello, >>>>>> >>>>>> On Thu 10-04-14 12:02:50, Marek Szyprowski wrote: >>>>>>> On 2014-03-17 20:49, Jan Kara wrote: >>>>>>>> The following patch series is my first stab at abstracting vma handling >>>>>>> >from the various media drivers. After this patch set drivers have to know >>>>>>>> much less details about vmas, their types, and locking. My motivation for >>>>>>>> the series is that I want to change get_user_pages() locking and I want >>>>>>>> to handle subtle locking details in as few places as possible. >>>>>>>> >>>>>>>> The core of the series is the new helper get_vaddr_pfns() which is given a >>>>>>>> virtual address and it fills in PFNs into provided array. If PFNs correspond to >>>>>>>> normal pages it also grabs references to these pages. The difference from >>>>>>>> get_user_pages() is that this function can also deal with pfnmap, mixed, and io >>>>>>>> mappings which is what the media drivers need. >>>>>>>> >>>>>>>> The patches are just compile tested (since I don't have any of the hardware >>>>>>>> I'm afraid I won't be able to do any more testing anyway) so please handle >>>>>>>> with care. I'm grateful for any comments. >>>>>>> >>>>>>> Thanks for posting this series! I will check if it works with our >>>>>>> hardware soon. This is something I wanted to introduce some time ago to >>>>>>> simplify buffer handling in dma-buf, but I had no time to start working. >>>>>> Thanks for having a look in the series. >>>>>> >>>>>>> However I would like to go even further with integration of your pfn >>>>>>> vector idea. This structure looks like a best solution for a compact >>>>>>> representation of the memory buffer, which should be considered by the >>>>>>> hardware as contiguous (either contiguous in physical memory or mapped >>>>>>> contiguously into dma address space by the respective iommu). As you >>>>>>> already noticed it is widely used by graphics and video drivers. >>>>>>> >>>>>>> I would also like to add support for pfn vector directly to the >>>>>>> dma-mapping subsystem. This can be done quite easily (even with a >>>>>>> fallback for architectures which don't provide method for it). I will try >>>>>>> to prepare rfc soon. This will finally remove the need for hacks in >>>>>>> media/v4l2-core/videobuf2-dma-contig.c >>>>>> That would be a worthwhile thing to do. When I was reading the code this >>>>>> seemed like something which could be done but I delibrately avoided doing >>>>>> more unification than necessary for my purposes as I don't have any >>>>>> hardware to test and don't know all the subtleties in the code... BTW, is >>>>>> there some way to test the drivers without the physical video HW? >>>>> >>>>> You can use the vivi driver (drivers/media/platform/vivi) for this. >>>>> However, while the vivi driver can import dma buffers it cannot export >>>>> them. If you want that, then you have to use this tree: >>>>> >>>>> http://git.linuxtv.org/cgit.cgi/hverkuil/media_tree.git/log/?h=vb2-part4 >>>> Thanks for the pointer that looks good. I've also found >>>> drivers/media/platform/mem2mem_testdev.c which seems to do even more >>>> testing of the area I made changes to. So now I have to find some userspace >>>> tool which can issue proper ioctls to setup and use the buffers and I can >>>> start testing what I wrote :) >>> >>> Get the v4l-utils.git repository (http://git.linuxtv.org/cgit.cgi/v4l-utils.git/). >>> You want the v4l2-ctl tool. Don't use the version supplied by your distro, >>> that's often too old. >>> >>> 'v4l2-ctl --help-streaming' gives the available options for doing streaming. >>> >>> So simple capturing from vivi is 'v4l2-ctl --stream-mmap' or '--stream-user'. >>> You can't test dmabuf unless you switch to the vb2-part4 branch of my tree. >> Great, it seems to be doing something and it shows there's some bug in my >> code. Thanks a lot for help. > OK, so after a small fix the basic functionality seems to be working. It > doesn't seem there's a way to test multiplanar buffers with vivi, is there? For that you need to switch to the vb2-part4 branch as well. That has support for multiplanar. Regards, Hans

11 years, 10 months

1
0
0 0

Re: [Linaro-mm-sig] [RFC] Helper to abstract vma handling in media layer

by Hans Verkuil

On 04/10/14 14:15, Jan Kara wrote: > On Thu 10-04-14 13:07:42, Hans Verkuil wrote: >> On 04/10/14 12:32, Jan Kara wrote: >>> Hello, >>> >>> On Thu 10-04-14 12:02:50, Marek Szyprowski wrote: >>>> On 2014-03-17 20:49, Jan Kara wrote: >>>>> The following patch series is my first stab at abstracting vma handling >>>> >from the various media drivers. After this patch set drivers have to know >>>>> much less details about vmas, their types, and locking. My motivation for >>>>> the series is that I want to change get_user_pages() locking and I want >>>>> to handle subtle locking details in as few places as possible. >>>>> >>>>> The core of the series is the new helper get_vaddr_pfns() which is given a >>>>> virtual address and it fills in PFNs into provided array. If PFNs correspond to >>>>> normal pages it also grabs references to these pages. The difference from >>>>> get_user_pages() is that this function can also deal with pfnmap, mixed, and io >>>>> mappings which is what the media drivers need. >>>>> >>>>> The patches are just compile tested (since I don't have any of the hardware >>>>> I'm afraid I won't be able to do any more testing anyway) so please handle >>>>> with care. I'm grateful for any comments. >>>> >>>> Thanks for posting this series! I will check if it works with our >>>> hardware soon. This is something I wanted to introduce some time ago to >>>> simplify buffer handling in dma-buf, but I had no time to start working. >>> Thanks for having a look in the series. >>> >>>> However I would like to go even further with integration of your pfn >>>> vector idea. This structure looks like a best solution for a compact >>>> representation of the memory buffer, which should be considered by the >>>> hardware as contiguous (either contiguous in physical memory or mapped >>>> contiguously into dma address space by the respective iommu). As you >>>> already noticed it is widely used by graphics and video drivers. >>>> >>>> I would also like to add support for pfn vector directly to the >>>> dma-mapping subsystem. This can be done quite easily (even with a >>>> fallback for architectures which don't provide method for it). I will try >>>> to prepare rfc soon. This will finally remove the need for hacks in >>>> media/v4l2-core/videobuf2-dma-contig.c >>> That would be a worthwhile thing to do. When I was reading the code this >>> seemed like something which could be done but I delibrately avoided doing >>> more unification than necessary for my purposes as I don't have any >>> hardware to test and don't know all the subtleties in the code... BTW, is >>> there some way to test the drivers without the physical video HW? >> >> You can use the vivi driver (drivers/media/platform/vivi) for this. >> However, while the vivi driver can import dma buffers it cannot export >> them. If you want that, then you have to use this tree: >> >> http://git.linuxtv.org/cgit.cgi/hverkuil/media_tree.git/log/?h=vb2-part4 > Thanks for the pointer that looks good. I've also found > drivers/media/platform/mem2mem_testdev.c which seems to do even more > testing of the area I made changes to. So now I have to find some userspace > tool which can issue proper ioctls to setup and use the buffers and I can > start testing what I wrote :) Get the v4l-utils.git repository (http://git.linuxtv.org/cgit.cgi/v4l-utils.git/). You want the v4l2-ctl tool. Don't use the version supplied by your distro, that's often too old. 'v4l2-ctl --help-streaming' gives the available options for doing streaming. So simple capturing from vivi is 'v4l2-ctl --stream-mmap' or '--stream-user'. You can't test dmabuf unless you switch to the vb2-part4 branch of my tree. If you need help with testing it's easiest to contact me on the #v4l irc channel. Regards, Hans

11 years, 10 months

1
0
0 0

Re: [Linaro-mm-sig] [RFC] Helper to abstract vma handling in media layer

by Hans Verkuil

On 04/10/14 12:32, Jan Kara wrote: > Hello, > > On Thu 10-04-14 12:02:50, Marek Szyprowski wrote: >> On 2014-03-17 20:49, Jan Kara wrote: >>> The following patch series is my first stab at abstracting vma handling >> >from the various media drivers. After this patch set drivers have to know >>> much less details about vmas, their types, and locking. My motivation for >>> the series is that I want to change get_user_pages() locking and I want >>> to handle subtle locking details in as few places as possible. >>> >>> The core of the series is the new helper get_vaddr_pfns() which is given a >>> virtual address and it fills in PFNs into provided array. If PFNs correspond to >>> normal pages it also grabs references to these pages. The difference from >>> get_user_pages() is that this function can also deal with pfnmap, mixed, and io >>> mappings which is what the media drivers need. >>> >>> The patches are just compile tested (since I don't have any of the hardware >>> I'm afraid I won't be able to do any more testing anyway) so please handle >>> with care. I'm grateful for any comments. >> >> Thanks for posting this series! I will check if it works with our >> hardware soon. This is something I wanted to introduce some time ago to >> simplify buffer handling in dma-buf, but I had no time to start working. > Thanks for having a look in the series. > >> However I would like to go even further with integration of your pfn >> vector idea. This structure looks like a best solution for a compact >> representation of the memory buffer, which should be considered by the >> hardware as contiguous (either contiguous in physical memory or mapped >> contiguously into dma address space by the respective iommu). As you >> already noticed it is widely used by graphics and video drivers. >> >> I would also like to add support for pfn vector directly to the >> dma-mapping subsystem. This can be done quite easily (even with a >> fallback for architectures which don't provide method for it). I will try >> to prepare rfc soon. This will finally remove the need for hacks in >> media/v4l2-core/videobuf2-dma-contig.c > That would be a worthwhile thing to do. When I was reading the code this > seemed like something which could be done but I delibrately avoided doing > more unification than necessary for my purposes as I don't have any > hardware to test and don't know all the subtleties in the code... BTW, is > there some way to test the drivers without the physical video HW? You can use the vivi driver (drivers/media/platform/vivi) for this. However, while the vivi driver can import dma buffers it cannot export them. If you want that, then you have to use this tree: http://git.linuxtv.org/cgit.cgi/hverkuil/media_tree.git/log/?h=vb2-part4 Regards, Hans

11 years, 10 months

1
0
0 0

Re: [Linaro-mm-sig] [RFC] Helper to abstract vma handling in media layer

by Marek Szyprowski

Hello, On 2014-03-17 20:49, Jan Kara wrote: > The following patch series is my first stab at abstracting vma handling > from the various media drivers. After this patch set drivers have to know > much less details about vmas, their types, and locking. My motivation for > the series is that I want to change get_user_pages() locking and I want > to handle subtle locking details in as few places as possible. > > The core of the series is the new helper get_vaddr_pfns() which is given a > virtual address and it fills in PFNs into provided array. If PFNs correspond to > normal pages it also grabs references to these pages. The difference from > get_user_pages() is that this function can also deal with pfnmap, mixed, and io > mappings which is what the media drivers need. > > The patches are just compile tested (since I don't have any of the hardware > I'm afraid I won't be able to do any more testing anyway) so please handle > with care. I'm grateful for any comments. Thanks for posting this series! I will check if it works with our hardware soon. This is something I wanted to introduce some time ago to simplify buffer handling in dma-buf, but I had no time to start working. However I would like to go even further with integration of your pfn vector idea. This structure looks like a best solution for a compact representation of the memory buffer, which should be considered by the hardware as contiguous (either contiguous in physical memory or mapped contiguously into dma address space by the respective iommu). As you already noticed it is widely used by graphics and video drivers. I would also like to add support for pfn vector directly to the dma-mapping subsystem. This can be done quite easily (even with a fallback for architectures which don't provide method for it). I will try to prepare rfc soon. This will finally remove the need for hacks in media/v4l2-core/videobuf2-dma-contig.c Thanks for motivating me to finally start working on this! Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland

11 years, 10 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH 2/2] dma-buf: update exp_name when using dma_buf_export()

by Daniel Vetter

On Thu, Apr 10, 2014 at 01:30:06AM +0200, Javier Martinez Canillas wrote: > commit c0b00a5 ("dma-buf: update debugfs output") modified the > default exporter name to be the KBUILD_MODNAME pre-processor > macro instead of __FILE__ but the documentation was not updated. > > Also the "Supporting existing mmap interfaces in exporters" section > title seems wrong since talks about the interface used by importers. > > Signed-off-by: Javier Martinez Canillas <javier.martinez(a)collabora.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> > --- > Documentation/dma-buf-sharing.txt | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt > index 505e711..7d61cef 100644 > --- a/Documentation/dma-buf-sharing.txt > +++ b/Documentation/dma-buf-sharing.txt > @@ -66,7 +66,7 @@ The dma_buf buffer sharing API usage contains the following steps: > > Exporting modules which do not wish to provide any specific name may use the > helper define 'dma_buf_export()', with the same arguments as above, but > - without the last argument; a __FILE__ pre-processor directive will be > + without the last argument; a KBUILD_MODNAME pre-processor directive will be > inserted in place of 'exp_name' instead. > > 2. Userspace gets a handle to pass around to potential buffer-users > @@ -352,7 +352,7 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: > > No special interfaces, userspace simply calls mmap on the dma-buf fd. > > -2. Supporting existing mmap interfaces in exporters > +2. Supporting existing mmap interfaces in importers > > Similar to the motivation for kernel cpu access it is again important that > the userspace code of a given importing subsystem can use the same interfaces > -- > 1.9.0 > > _______________________________________________ > dri-devel mailing list > dri-devel(a)lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

11 years, 10 months

1
0
0 0

Performance issues with dmabuf

by kiran

Hi, I was looking at some code(given below) which seems to perform very badly when attachments and detachments to used to simulate cache coherency. In the code below, when remote_attach is false(ie no remote processors), using just the two A9 cores the following code runs in 8.8 seconds. But when remote_attach is true then even though there are other cores also executing and sharing the workload the following code takes 52.7 seconds. This shows that detach and attach is very heavy for this kind of code. (The system call detach performs dma_buf_unmap_attachment and dma_buf_detach, system call attach performs dma_buf_attach and dma_buf_map_attachment). for (k = 0; k < N; k++) { if(remote_attach) { detach(path) ; attach(path) ; } for(i = start_indx; i < end_indx; i++) { for (j = 0; j < N; j++) { if(path[i][j] < (path[i][k] + path[k][j])) { path[i][j] = path[i][k] + path[k][j] ; } } } } I would like to manage the cache explicitly and flush cache lines rather than pages to reduce overhead. I also want to access these buffers from the userspace. I can change some kernel code for this. Where should I start ? Thanks in advance. --Kiran

11 years, 10 months

3
6
0 0

[RFC] dma-buf: Implement test module

by Thierry Reding

This is a simple test module that can be used to allocate, export and delete DMA-BUF objects. It can be used to test DMA-BUF sharing in systems that lack a real second driver. Signed-off-by: Thierry Reding <treding(a)nvidia.com> --- drivers/base/Kconfig | 4 + drivers/base/Makefile | 1 + drivers/base/dma-buf-test.c | 308 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 313 insertions(+) create mode 100644 drivers/base/dma-buf-test.c diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index e373671652b0..bed2abb9491b 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -200,6 +200,10 @@ config DMA_SHARED_BUFFER APIs extension; the file's descriptor can then be passed on to other driver. +config DMA_BUF_TEST + tristate "DMA-BUF test module" + depends on DMA_SHARED_BUFFER + config DMA_CMA bool "DMA Contiguous Memory Allocator" depends on HAVE_DMA_CONTIGUOUS && CMA diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 94e8a80e87f8..cad983b6626f 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -25,3 +25,4 @@ obj-$(CONFIG_PINCTRL) += pinctrl.o ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG +obj-$(CONFIG_DMA_BUF_TEST) += dma-buf-test.o diff --git a/drivers/base/dma-buf-test.c b/drivers/base/dma-buf-test.c new file mode 100644 index 000000000000..f5498b74a09b --- /dev/null +++ b/drivers/base/dma-buf-test.c @@ -0,0 +1,308 @@ +#include <linux/dma-buf.h> +#include <linux/err.h> +#include <linux/fs.h> +#include <linux/miscdevice.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/uaccess.h> + +struct dmabuf_create { + __u32 flags; + __u32 size; +}; + +#define DMABUF_IOCTL_BASE 'D' +#define DMABUF_IOCTL_CREATE _IOWR(DMABUF_IOCTL_BASE, 0, struct dmabuf_create) +#define DMABUF_IOCTL_DELETE _IOWR(DMABUF_IOCTL_BASE, 1, int) +#define DMABUF_IOCTL_EXPORT _IOWR(DMABUF_IOCTL_BASE, 2, int) + +struct dmabuf_file { + struct dma_buf *buf; + dma_addr_t phys; + size_t size; + void *virt; +}; + +static int dmabuf_attach(struct dma_buf *buf, struct device *dev, + struct dma_buf_attachment *attach) +{ + return 0; +} + +static void dmabuf_detach(struct dma_buf *buf, + struct dma_buf_attachment *attach) +{ +} + +static struct sg_table *dmabuf_map_dma_buf(struct dma_buf_attachment *attach, + enum dma_data_direction dir) +{ + struct dmabuf_file *priv = attach->dmabuf->priv; + struct sg_table *sgt; + + sgt = kmalloc(sizeof(*sgt), GFP_KERNEL); + if (!sgt) + return NULL; + + if (sg_alloc_table(sgt, 1, GFP_KERNEL)) { + kfree(sgt); + return NULL; + } + + sg_dma_address(sgt->sgl) = priv->phys; + sg_dma_len(sgt->sgl) = priv->size; + + return sgt; +} + +static void dmabuf_unmap_dma_buf(struct dma_buf_attachment *attach, + struct sg_table *sgt, + enum dma_data_direction dir) +{ + sg_free_table(sgt); + kfree(sgt); +} + +static void dmabuf_release(struct dma_buf *buf) +{ +} + +static int dmabuf_begin_cpu_access(struct dma_buf *buf, size_t size, + size_t length, + enum dma_data_direction direction) +{ + return 0; +} + +static void dmabuf_end_cpu_access(struct dma_buf *buf, size_t size, + size_t length, + enum dma_data_direction direction) +{ +} + +static void *dmabuf_kmap_atomic(struct dma_buf *buf, unsigned long page) +{ + return NULL; +} + +static void dmabuf_kunmap_atomic(struct dma_buf *buf, unsigned long page, + void *vaddr) +{ +} + +static void *dmabuf_kmap(struct dma_buf *buf, unsigned long page) +{ + return NULL; +} + +static void dmabuf_kunmap(struct dma_buf *buf, unsigned long page, void *vaddr) +{ +} + +static void dmabuf_vm_open(struct vm_area_struct *vma) +{ +} + +static void dmabuf_vm_close(struct vm_area_struct *vma) +{ +} + +static int dmabuf_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + return 0; +} + +static const struct vm_operations_struct dmabuf_vm_ops = { + .open = dmabuf_vm_open, + .close = dmabuf_vm_close, + .fault = dmabuf_vm_fault, +}; + +static int dmabuf_mmap(struct dma_buf *buf, struct vm_area_struct *vma) +{ + pgprot_t prot = vm_get_page_prot(vma->vm_flags); + struct dmabuf_file *priv = buf->priv; + + vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; + vma->vm_ops = &dmabuf_vm_ops; + vma->vm_private_data = priv; + vma->vm_page_prot = pgprot_writecombine(prot); + + return remap_pfn_range(vma, vma->vm_start, priv->phys >> PAGE_SHIFT, + vma->vm_end - vma->vm_start, vma->vm_page_prot); +} + +static void *dmabuf_vmap(struct dma_buf *buf) +{ + return NULL; +} + +static void dmabuf_vunmap(struct dma_buf *buf, void *vaddr) +{ +} + +static const struct dma_buf_ops dmabuf_ops = { + .attach = dmabuf_attach, + .detach = dmabuf_detach, + .map_dma_buf = dmabuf_map_dma_buf, + .unmap_dma_buf = dmabuf_unmap_dma_buf, + .release = dmabuf_release, + .begin_cpu_access = dmabuf_begin_cpu_access, + .end_cpu_access = dmabuf_end_cpu_access, + .kmap_atomic = dmabuf_kmap_atomic, + .kunmap_atomic = dmabuf_kunmap_atomic, + .kmap = dmabuf_kmap, + .kunmap = dmabuf_kunmap, + .mmap = dmabuf_mmap, + .vmap = dmabuf_vmap, + .vunmap = dmabuf_vunmap, +}; + +static int dmabuf_file_open(struct inode *inode, struct file *file) +{ + struct dmabuf_file *priv; + int ret = 0; + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + file->private_data = priv; + + return ret; +} + +static int dmabuf_file_release(struct inode *inode, struct file *file) +{ + struct dmabuf_file *priv = file->private_data; + int ret = 0; + + if (priv->virt) + dma_free_writecombine(NULL, priv->size, priv->virt, priv->phys); + + if (priv->buf) + dma_buf_put(priv->buf); + + kfree(priv); + + return ret; +} + +static int dmabuf_ioctl_create(struct dmabuf_file *priv, const void __user *data) +{ + struct dmabuf_create args; + int ret = 0; + + if (priv->buf || priv->virt) + return -EBUSY; + + if (copy_from_user(&args, data, sizeof(args))) + return -EFAULT; + + priv->virt = dma_alloc_writecombine(NULL, args.size, &priv->phys, + GFP_KERNEL | __GFP_NOWARN); + if (!priv->virt) + return -ENOMEM; + + priv->buf = dma_buf_export(priv, &dmabuf_ops, args.size, args.flags); + if (!priv->buf) { + ret = -ENOMEM; + goto free; + } + + if (IS_ERR(priv->buf)) { + ret = PTR_ERR(priv->buf); + goto free; + } + + priv->size = args.size; + + return 0; + +free: + dma_free_writecombine(NULL, priv->size, priv->virt, priv->phys); + priv->virt = NULL; + return ret; +} + +static int dmabuf_ioctl_delete(struct dmabuf_file *priv, unsigned long flags) +{ + dma_free_writecombine(NULL, priv->size, priv->virt, priv->phys); + priv->virt = NULL; + priv->phys = 0; + priv->size = 0; + + dma_buf_put(priv->buf); + priv->buf = NULL; + + return 0; +} + +static int dmabuf_ioctl_export(struct dmabuf_file *priv, unsigned long flags) +{ + int err; + + get_dma_buf(priv->buf); + + err = dma_buf_fd(priv->buf, flags); + if (err < 0) + dma_buf_put(priv->buf); + + return err; +} + +static long dmabuf_file_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + struct dmabuf_file *priv = file->private_data; + long ret = 0; + + switch (cmd) { + case DMABUF_IOCTL_CREATE: + ret = dmabuf_ioctl_create(priv, (const void __user *)arg); + break; + + case DMABUF_IOCTL_DELETE: + ret = dmabuf_ioctl_delete(priv, arg); + break; + + case DMABUF_IOCTL_EXPORT: + ret = dmabuf_ioctl_export(priv, arg); + break; + + default: + ret = -ENOTTY; + break; + } + + return ret; +} + +static const struct file_operations dmabuf_fops = { + .owner = THIS_MODULE, + .open = dmabuf_file_open, + .release = dmabuf_file_release, + .unlocked_ioctl = dmabuf_file_ioctl, +}; + +static struct miscdevice dmabuf_device = { + .minor = 128, + .name = "dmabuf", + .fops = &dmabuf_fops, +}; + +static int __init dmabuf_init(void) +{ + return misc_register(&dmabuf_device); +} +module_init(dmabuf_init); + +static void __exit dmabuf_exit(void) +{ + misc_deregister(&dmabuf_device); +} +module_exit(dmabuf_exit); + +MODULE_AUTHOR("Thierry Reding <treding(a)nvidia.com>"); +MODULE_DESCRIPTION("DMA-BUF test driver"); +MODULE_LICENSE("GPL v2"); -- 1.8.4.2

11 years, 10 months

6
18
0 0

[PATCH] dma-buf: add meta data attachment

by Bin Wang

we found there'd be varied specific data connected to a dmabuf, like the iommu mapping of buffer, the dma descriptors (that can be shared between several components). Though these info might be able to be generated every time before dma operations, for performance sake, it's better to be kept before really invalid. Change-Id: I89d43dc3fe1ee3da91c42074da5df71b968e6d3c Signed-off-by: Bin Wang <binw(a)marvell.com> --- drivers/base/dma-buf.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 22 ++++++++++ 2 files changed, 122 insertions(+), 0 deletions(-) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index 08fe897..5c82e60 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -50,6 +50,7 @@ static int dma_buf_release(struct inode *inode, struct file *file) BUG_ON(dmabuf->vmapping_counter); + dma_buf_meta_release(dmabuf); dmabuf->ops->release(dmabuf); mutex_lock(&db_list.lock); @@ -138,6 +139,7 @@ struct dma_buf *dma_buf_export_named(void *priv, const struct dma_buf_ops *ops, mutex_init(&dmabuf->lock); INIT_LIST_HEAD(&dmabuf->attachments); + INIT_LIST_HEAD(&dmabuf->metas); mutex_lock(&db_list.lock); list_add(&dmabuf->list_node, &db_list.head); @@ -570,6 +572,104 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr) } EXPORT_SYMBOL_GPL(dma_buf_vunmap); +/** + * dma_buf_meta_attach - Attach additional meta data to the dmabuf + * @dmabuf: [in] the dmabuf to attach to + * @id: [in] the id of the meta data + * @pdata: [in] the raw data to be attached + * @release: [in] the callback to release the meta data + */ +int dma_buf_meta_attach(struct dma_buf *dmabuf, int id, void *pdata, + int (*release)(void *)) +{ + struct dma_buf_meta *pmeta; + + pmeta = kmalloc(sizeof(struct dma_buf_meta), GFP_KERNEL); + if (pmeta == NULL) + return -ENOMEM; + + pmeta->id = id; + pmeta->pdata = pdata; + pmeta->release = release; + + mutex_lock(&dmabuf->lock); + list_add(&pmeta->node, &dmabuf->metas); + mutex_unlock(&dmabuf->lock); + + return 0; +} +EXPORT_SYMBOL_GPL(dma_buf_meta_attach); + +/** + * dma_buf_meta_dettach - Dettach the meta data from dmabuf by id + * @dmabuf: [in] the dmabuf including the meta data + * @id: [in] the id of the meta data + */ +int dma_buf_meta_dettach(struct dma_buf *dmabuf, int id) +{ + struct dma_buf_meta *pmeta, *tmp; + int ret = -ENOENT; + + mutex_lock(&dmabuf->lock); + list_for_each_entry_safe(pmeta, tmp, &dmabuf->metas, node) { + if (pmeta->id == id) { + if (pmeta->release) + pmeta->release(pmeta->pdata); + list_del(&pmeta->node); + kfree(pmeta); + ret = 0; + break; + } + } + mutex_unlock(&dmabuf->lock); + + return ret; +} +EXPORT_SYMBOL_GPL(dma_buf_meta_dettach); + +/** + * dma_buf_meta_fetch - Get the meta data from dmabuf by id + * @dmabuf: [in] the dmabuf including the meta data + * @id: [in] the id of the meta data + */ +void *dma_buf_meta_fetch(struct dma_buf *dmabuf, int id) +{ + struct dma_buf_meta *pmeta; + void *pdata = NULL; + + mutex_lock(&dmabuf->lock); + list_for_each_entry(pmeta, &dmabuf->metas, node) { + if (pmeta->id == id) { + pdata = pmeta->pdata; + break; + } + } + mutex_unlock(&dmabuf->lock); + + return pdata; +} +EXPORT_SYMBOL_GPL(dma_buf_meta_fetch); + +/** + * dma_buf_meta_release - Release all the meta data attached to the dmabuf + * @dmabuf: [in] the dmabuf including the meta data + */ +void dma_buf_meta_release(struct dma_buf *dmabuf) +{ + struct dma_buf_meta *pmeta, *tmp; + + mutex_lock(&dmabuf->lock); + list_for_each_entry_safe(pmeta, tmp, &dmabuf->metas, node) { + if (pmeta->release) + pmeta->release(pmeta->pdata); + list_del(&pmeta->node); + kfree(pmeta); + } + mutex_unlock(&dmabuf->lock); + + return; +} + #ifdef CONFIG_DEBUG_FS static int dma_buf_describe(struct seq_file *s) { diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index dfac5ed..369d032 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -120,6 +120,7 @@ struct dma_buf { size_t size; struct file *file; struct list_head attachments; + struct list_head metas; const struct dma_buf_ops *ops; /* mutex to serialize list manipulation, attach/detach and vmap/unmap */ struct mutex lock; @@ -149,6 +150,20 @@ struct dma_buf_attachment { }; /** + * struct dma_buf_meta - holds varied meta data attached to the buffer + * @id: the identification of the meta data + * @dmabuf: buffer for this attachment. + * @node: list of dma_buf_meta. + * @pdata: specific meta data. + */ +struct dma_buf_meta { + int id; + struct list_head node; + int (*release)(void *pdata); + void *pdata; +}; + +/** * get_dma_buf - convenience wrapper for get_file. * @dmabuf: [in] pointer to dma_buf * @@ -194,6 +209,13 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); void *dma_buf_vmap(struct dma_buf *); void dma_buf_vunmap(struct dma_buf *, void *vaddr); + +int dma_buf_meta_attach(struct dma_buf *dmabuf, int id, void *pdata, + int (*release)(void *)); +int dma_buf_meta_dettach(struct dma_buf *dmabuf, int id); +void *dma_buf_meta_fetch(struct dma_buf *dmabuf, int id); +void dma_buf_meta_release(struct dma_buf *dmabuf); + int dma_buf_debugfs_create_file(const char *name, int (*write)(struct seq_file *)); #endif /* __DMA_BUF_H__ */ -- 1.7.0.4

11 years, 10 months

5
5
0 0

Re: [Linaro-mm-sig] [PATCH v5 09/11] arm: add support for reserved memory defined by device tree

by Grant Likely

On Thu, 13 Mar 2014 14:51:56 -0700, Kevin Hilman <khilman(a)linaro.org> wrote: > Josh Cartwright <joshc(a)codeaurora.org> writes: > > > On Thu, Mar 13, 2014 at 01:46:50PM -0700, Kevin Hilman wrote: > >> On Fri, Feb 21, 2014 at 4:25 AM, Marek Szyprowski > >> <m.szyprowski(a)samsung.com> wrote: > >> > Enable reserved memory initialization from device tree. > >> > > >> > Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com> > >> > >> This patch has hit -next and several legacy (non-DT) boot failures > >> were detected and bisected down to this patch. A quick scan looks > >> like there needs to be some sanity checking whether a DT is even > >> present. > > > > Hmm. Yes, the code unconditionally calls of_flat_dt_scan(), which will > > gladly touch initial_boot_params, even though it may be uninitialized. > > The below patch should allow these boards to boot... > > > > However, I'm wondering if there is a good reason why we don't parse the > > /reserved-memory nodes at the right after we parse the /memory nodes as > > part of early_init_dt_scan()... > > > > Thanks, > > Josh > > > > --8<-- > > Subject: [PATCH] drivers: of: only scan for reserved mem when fdt present > > > > Reported-by: Kevin Hilman <khilman(a)linaro.org> > > Signed-off-by: Josh Cartwright <joshc(a)codeaurora.org> > > This gets legacy boot working again. Thanks. > > Tested-by: Kevin Hilman <khilman(a)linaro.org> Applied and confirmed on non-DT qemu boot. Thanks. It will be pushed out shortly. g.

11 years, 11 months

1
0
0 0

[RFC][PATCH 1/1] ion: support multiple address space map

by Hiroshi Doyu

Hi, Our H/Ws(inc. GPU) can accept a dmabuf fd which is originally allocated via ION. GPU tries to map this buffer into GPU address space via ION dmabuf backend(.map_dma_buf). Our Tegra IOMMU supports multiple address space in IOMMU. Basically each device has its own address space assigned by IOMMU. Usually DMA API hides the existance of IOMMU from driver code. To support this multiple IOMMU address space, I think that at least "ion_map_dma_buf" needs to create IOMMU mapping *per* device(address space). And I made the following patch *experimentally*. Is this the right solution to solve this multiple IOMMU address space? Any comment would be really appreciated. ---8<------8<------8<------8<------8<------8<------8<------8<--- From: Hiroshi Doyu <hdoyu(a)nvidia.com> Ion doesn't support multiple address space. The recent IOMMU usually provide multiple address spaces, where a mapping depends on an address space. An address space is bind to a device pointer passed from attachment. Those info needs to be stored in a buffer info. If the same buffer needs to be mapped into the same AS, the same mapping info needs to be passed back. Signed-off-by: Hiroshi Doyu <hdoyu(a)nvidia.com> --- drivers/staging/android/ion/ion.c | 96 +++++++++++++++++++++++++++++++++- drivers/staging/android/ion/ion_priv.h | 10 ++++ 2 files changed, 104 insertions(+), 2 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 0836717..e260434 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -872,17 +872,109 @@ static void ion_buffer_sync_for_device(struct ion_buffer *buffer, static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment, enum dma_data_direction direction) { + int err, i, empty = -1; + struct dma_iommu_mapping *iommu_map; struct dma_buf *dmabuf = attachment->dmabuf; struct ion_buffer *buffer = dmabuf->priv; + unsigned int nents = buffer->sg_table->nents; + struct ion_mapping *map_ptr; + struct scatterlist *sg; + + iommu_map = to_dma_iommu_mapping(attachment->dev); + if (!iommu_map) { + ion_buffer_sync_for_device(buffer, attachment->dev, direction); + return buffer->sg_table; + } + + mutex_lock(&buffer->lock); + for (i = 0; i < ARRAY_SIZE(buffer->mapping); i++) { + map_ptr = &buffer->mapping[i]; + if (!map_ptr->dev) { + empty = i; + continue; + } + + if (to_dma_iommu_mapping(map_ptr->dev) == iommu_map) { + kref_get(&map_ptr->kref); + mutex_unlock(&buffer->lock); + return &map_ptr->sgt; + } + } - ion_buffer_sync_for_device(buffer, attachment->dev, direction); - return buffer->sg_table; + if (!empty) { + err = -ENOMEM; + goto err_no_space; + } + + map_ptr = &buffer->mapping[empty]; + err = sg_alloc_table(&map_ptr->sgt, nents, GFP_KERNEL); + if (err) + goto err_sg_alloc_table; + + for_each_sg(buffer->sg_table->sgl, sg, nents, i) + memcpy(map_ptr->sgt.sgl + i, sg, sizeof(*sg)); + + nents = dma_map_sg(attachment->dev, map_ptr->sgt.sgl, nents, direction); + if (!nents) { + err = -EINVAL; + goto err_dma_map_sg; + } + + kref_init(&map_ptr->kref); + map_ptr->dev = attachment->dev; + mutex_unlock(&buffer->lock); + return &map_ptr->sgt; + +err_dma_map_sg: + sg_free_table(&map_ptr->sgt); +err_sg_alloc_table: +err_no_space: + mutex_unlock(&buffer->lock); + return ERR_PTR(err); +} + +static void __ion_unmap_dma_buf(struct kref *kref) +{ + struct ion_mapping *map_ptr; + + map_ptr = container_of(kref, struct ion_mapping, kref); + dma_unmap_sg(map_ptr->dev, map_ptr->sgt.sgl, map_ptr->sgt.nents, + DMA_BIDIRECTIONAL); + sg_free_table(&map_ptr->sgt); + memset(map_ptr, 0, sizeof(*map_ptr)); } static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment, struct sg_table *table, enum dma_data_direction direction) { + int i; + struct dma_iommu_mapping *iommu_map; + struct dma_buf *dmabuf = attachment->dmabuf; + struct ion_buffer *buffer = dmabuf->priv; + struct ion_mapping *map_ptr; + + iommu_map = to_dma_iommu_mapping(attachment->dev); + if (!iommu_map) + return; + + mutex_lock(&buffer->lock); + for (i = 0; i < ARRAY_SIZE(buffer->mapping); i++) { + map_ptr = &buffer->mapping[i]; + if (!map_ptr->dev) + continue; + + if (to_dma_iommu_mapping(map_ptr->dev) == iommu_map) { + kref_put(&map_ptr->kref, __ion_unmap_dma_buf); + mutex_unlock(&buffer->lock); + return; + } + } + + dev_warn(attachment->dev, "Not found a map(%p)\n", + to_dma_iommu_mapping(attachment->dev)); + + mutex_unlock(&buffer->lock); } void ion_pages_sync_for_device(struct device *dev, struct page *page, diff --git a/drivers/staging/android/ion/ion_priv.h b/drivers/staging/android/ion/ion_priv.h index 1eba3f2..441c251 100644 --- a/drivers/staging/android/ion/ion_priv.h +++ b/drivers/staging/android/ion/ion_priv.h @@ -26,11 +26,19 @@ #include <linux/sched.h> #include <linux/shrinker.h> #include <linux/types.h> +#include <linux/scatterlist.h> #include "ion.h" struct ion_buffer *ion_handle_buffer(struct ion_handle *handle); +struct ion_mapping { + struct device *dev; /* to get a map and dma_ops */ + struct sg_table sgt; + struct kref kref; +}; +#define NUM_ION_MAPPING 5 /* FIXME: dynamically allocate more than this */ + /** * struct ion_buffer - metadata for a particular buffer * @ref: refernce count @@ -84,6 +92,8 @@ struct ion_buffer { int handle_count; char task_comm[TASK_COMM_LEN]; pid_t pid; + + struct ion_mapping mapping[NUM_ION_MAPPING]; }; void ion_buffer_destroy(struct ion_buffer *buffer); -- 1.8.1.5

11 years, 11 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v5 09/11] arm: add support for reserved memory defined by device tree

by Josh Cartwright

On Thu, Mar 13, 2014 at 01:46:50PM -0700, Kevin Hilman wrote: > On Fri, Feb 21, 2014 at 4:25 AM, Marek Szyprowski > <m.szyprowski(a)samsung.com> wrote: > > Enable reserved memory initialization from device tree. > > > > Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com> > > This patch has hit -next and several legacy (non-DT) boot failures > were detected and bisected down to this patch. A quick scan looks > like there needs to be some sanity checking whether a DT is even > present. Hmm. Yes, the code unconditionally calls of_flat_dt_scan(), which will gladly touch initial_boot_params, even though it may be uninitialized. The below patch should allow these boards to boot... However, I'm wondering if there is a good reason why we don't parse the /reserved-memory nodes at the right after we parse the /memory nodes as part of early_init_dt_scan()... Thanks, Josh --8<-- Subject: [PATCH] drivers: of: only scan for reserved mem when fdt present Reported-by: Kevin Hilman <khilman(a)linaro.org> Signed-off-by: Josh Cartwright <joshc(a)codeaurora.org> --- drivers/of/fdt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 510c0d8..501bc83 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -557,6 +557,9 @@ static int __init __fdt_scan_reserved_mem(unsigned long node, const char *uname, */ void __init early_init_fdt_scan_reserved_mem(void) { + if (!initial_boot_params) + return; + of_scan_flat_dt(__fdt_scan_reserved_mem, NULL); fdt_init_reserved_mem(); } -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

11 years, 11 months

1
0
0 0

[PATCH v6 00/11] reserved-memory regions/CMA in devicetree, again

by Marek Szyprowski

Hello again! Here is another update of the support for reserved memory regions in device tree. I've fixes a few more minor issues pointed by Grant. See changelog for more details. The initial code for this feature were posted here [1], merged as commit 9d8eab7af79cb4ce2de5de39f82c455b1f796963 ("drivers: of: add initialization code for dma reserved memory") and later reverted by commit 1931ee143b0ab72924944bc06e363d837ba05063. For more information, see [2]. Finally a new bindings has been proposed [3] and Josh Cartwright a few days ago prepared some code which implements those bindings [4]. This finally pushed me again to find some time to finish this task and review the code. Josh agreed to give me the ownership of this series to continue preparing them for mainline inclusion. For more information please refer to the changlelog and links below. [1]: http://lkml.kernel.org/g/1377527959-5080-1-git-send-email-m.szyprowski@sams… [2]: http://lkml.kernel.org/g/1381476448-14548-1-git-send-email-m.szyprowski@sam… [3]: http://lkml.kernel.org/g/20131030134702.19B57C402A0@trevor.secretlab.ca [4]: http://thread.gmane.org/gmane.linux.documentation/19579 Changelog: v6: - removed the need for "#memory-region-cells" property - fixed compilation issues on some systems - some other minor code cleanups v5: https://lkml.org/lkml/2014/2/21/147 - sliced main patch into several smaller patches on Grant's request - fixed coding style issues pointed by Grant - use node->phandle value directly instead of parsing properties manually v4: https://lkml.org/lkml/2014/2/20/150 - dynamic allocations are processed after all static reservations has been done - moved code for handling static reservations to drivers/of/fdt.c - removed node matching by string comparison, now phandle values are used directly - moved code for DMA and CMA handling directly to drivers/base/dma-{coherent,contiguous}.c - added checks for proper #size-cells, #address-cells, ranges properties in /reserved-memory node - even more code cleanup - added init code for ARM64 and PowerPC v3: http://article.gmane.org/gmane.linux.documentation/20169/ - refactored memory reservation code, created common code to parse reg, size, align, alloc-ranges properties - added support for multiple tuples in 'reg' property - memory is reserved regardless of presence of the driver for its compatible - prepared arch specific hooks for memory reservation (defaults use memblock calls) - removed node matching by string during device initialization - CMA init code: added checks for required region alignment - more code cleanup here and there v2: http://thread.gmane.org/gmane.linux.documentation/19870/ - removed copying of the node name - split shared-dma-pool handling into separate files (one for CMA and one for dma_declare_coherent based implementations) for making the code easier to understand - added support for AMBA devices, changed prototypes to use struct decice instead of struct platform_device - renamed some functions to better match other names used in drivers/of/ - restructured the rest of the code a bit for better readability - added 'reusable' property to exmaple linux,cma node in documentation - exclusive dma (dma_coherent) is used for only handling 'shared-dma-pool' regions without 'reusable' property and CMA is used only for handling 'shared-dma-pool' regions with 'reusable' property. v1: http://thread.gmane.org/gmane.linux.documentation/19579 - initial version prepared by Josh Cartwright Summary: Grant Likely (1): of: document bindings for reserved-memory nodes Marek Szyprowski (10): drivers: of: add initialization code for static reserved memory drivers: of: add initialization code for dynamic reserved memory drivers: of: add support for custom reserved memory drivers drivers: of: add automated assignment of reserved regions to client devices drivers: of: initialize and assign reserved memory to newly created devices drivers: dma-coherent: add initialization from device tree drivers: dma-contiguous: add initialization from device tree arm: add support for reserved memory defined by device tree arm64: add support for reserved memory defined by device tree powerpc: add support for reserved memory defined by device tree .../bindings/reserved-memory/reserved-memory.txt | 136 ++++++++++ arch/arm/Kconfig | 1 + arch/arm/mm/init.c | 2 + arch/arm64/Kconfig | 1 + arch/arm64/mm/init.c | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/prom.c | 3 + drivers/base/dma-coherent.c | 40 +++ drivers/base/dma-contiguous.c | 129 +++++++-- drivers/of/Kconfig | 6 + drivers/of/Makefile | 1 + drivers/of/fdt.c | 140 ++++++++++ drivers/of/of_reserved_mem.c | 287 ++++++++++++++++++++ drivers/of/platform.c | 7 + include/asm-generic/vmlinux.lds.h | 11 + include/linux/of_fdt.h | 3 + include/linux/of_reserved_mem.h | 60 ++++ 17 files changed, 807 insertions(+), 22 deletions(-) create mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt create mode 100644 drivers/of/of_reserved_mem.c create mode 100644 include/linux/of_reserved_mem.h -- 1.7.9.5

11 years, 11 months

3
19
0 0

[PATCH 0/6] dma-buf synchronization patches

by Maarten Lankhorst

The following series implements fence and converts dma-buf and android sync to use it. Patch 6 and 7 add support for polling to dma-buf, blocking until all fences are signaled. I've dropped the extra patch to copy an export from the core, and instead use the public version of it. I've had to fix some fallout from the rebase, hopefully everything's clean now, and ready for -next. --- Maarten Lankhorst (6): fence: dma-buf cross-device synchronization (v17) seqno-fence: Hardware dma-buf implementation of fencing (v4) dma-buf: use reservation objects android: convert sync to fence api, v3 reservation: add support for fences to enable cross-device synchronisation dma-buf: add poll support, v2 Documentation/DocBook/device-drivers.tmpl | 3 drivers/base/Kconfig | 9 drivers/base/Makefile | 2 drivers/base/dma-buf.c | 123 +++ drivers/base/fence.c | 465 +++++++++++++ drivers/gpu/drm/drm_prime.c | 8 drivers/gpu/drm/exynos/exynos_drm_dmabuf.c | 2 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 2 drivers/gpu/drm/nouveau/nouveau_drm.c | 1 drivers/gpu/drm/nouveau/nouveau_gem.h | 1 drivers/gpu/drm/nouveau/nouveau_prime.c | 7 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 2 drivers/gpu/drm/radeon/radeon_drv.c | 2 drivers/gpu/drm/radeon/radeon_prime.c | 8 drivers/gpu/drm/ttm/ttm_object.c | 2 drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 drivers/staging/android/Kconfig | 1 drivers/staging/android/Makefile | 2 drivers/staging/android/sw_sync.c | 4 drivers/staging/android/sync.c | 892 ++++++++---------------- drivers/staging/android/sync.h | 80 +- drivers/staging/android/sync_debug.c | 245 +++++++ drivers/staging/android/trace/sync.h | 12 include/drm/drmP.h | 2 include/linux/dma-buf.h | 21 - include/linux/fence.h | 329 +++++++++ include/linux/reservation.h | 18 include/linux/seqno-fence.h | 109 +++ include/trace/events/fence.h | 125 +++ 29 files changed, 1822 insertions(+), 657 deletions(-) create mode 100644 drivers/base/fence.c create mode 100644 drivers/staging/android/sync_debug.c create mode 100644 include/linux/fence.h create mode 100644 include/linux/seqno-fence.h create mode 100644 include/trace/events/fence.h -- Signature

11 years, 11 months

5
22
0 0

Re: [Linaro-mm-sig] [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)

by Maarten Lankhorst

op 17-02-14 19:41, Christian König schreef: > Am 17.02.2014 19:24, schrieb Rob Clark: >> On Mon, Feb 17, 2014 at 12:36 PM, Christian König >> <deathsimple(a)vodafone.de> wrote: >>> Am 17.02.2014 18:27, schrieb Rob Clark: >>> >>>> On Mon, Feb 17, 2014 at 11:56 AM, Christian König >>>> <deathsimple(a)vodafone.de> wrote: >>>>> Am 17.02.2014 16:56, schrieb Maarten Lankhorst: >>>>> >>>>>> This type of fence can be used with hardware synchronization for simple >>>>>> hardware that can block execution until the condition >>>>>> (dma_buf[offset] - value) >= 0 has been met. >>>>> >>>>> Can't we make that just "dma_buf[offset] != 0" instead? As far as I know >>>>> this way it would match the definition M$ uses in their WDDM >>>>> specification >>>>> and so make it much more likely that hardware supports it. >>>> well 'buf[offset] >= value' at least means the same slot can be used >>>> for multiple operations (with increasing values of 'value').. not sure >>>> if that is something people care about. >>>> >>>>> =value seems to be possible with adreno and radeon. I'm not really sure >>>>> about others (although I presume it as least supported for nv desktop >>>>> stuff). For hw that cannot do >=value, we can either have a different fence >>>>> implementation which uses the !=0 approach. Or change seqno-fence >>>>> implementation later if needed. But if someone has hw that can do !=0 but >>>>> not >=value, speak up now ;-) >>> >>> Here! Radeon can only do >=value on the DMA and 3D engine, but not with UVD >>> or VCE. And for the 3D engine it means draining the pipe, which isn't really >>> a good idea. >> hmm, ok.. forgot you have a few extra rings compared to me. Is UVD >> re-ordering from decode-order to display-order for you in hw? If not, >> I guess you need sw intervention anyways when a frame is done for >> frame re-ordering, so maybe hw->hw sync doesn't really matter as much >> as compared to gpu/3d->display. For dma<->3d interactions, seems like >> you would care more about hw<->hw sync, but I guess you aren't likely >> to use GPU A to do a resolve blit for GPU B.. > > No UVD isn't reordering, but since frame reordering is predictable you usually end up with pipelining everything to the hardware. E.g. you send the decode commands in decode order to the UVD block and if you have overlay active one of the frames are going to be the first to display and then you want to wait for it on the display side. > >> For 3D ring, I assume you probably want a CP_WAIT_FOR_IDLE before a >> CP_MEM_WRITE to update fence value in memory (for the one signalling >> the fence). But why would you need that before a CP_WAIT_REG_MEM (for >> the one waiting for the fence)? I don't exactly have documentation >> for adreno version of CP_WAIT_REG_{MEM,EQ,GTE}.. but PFP and ME >> appear to be same instruction set as r600, so I'm pretty sure they >> should have similar capabilities.. CP_WAIT_REG_MEM appears to be same >> but with 32bit gpu addresses vs 64b. > > You shouldn't use any of the CP commands for engine synchronization (neither for wait nor for signal). The PFP and ME are just the top of a quite deep pipeline and when you use any of the CP_WAIT functions you block them for something and that's draining the pipeline. > > With the semaphore and fence commands the values are just attached as prerequisite to the draw command, e.g. the CP setups the draw environment and issues the command, but the actual execution of it is delayed until the "!= 0" condition hits. And in the meantime the CP already prepares the next draw operation. > > But at least for compute queues wait semaphore aren't the perfect solution either. What you need then is a GPU scheduler that uses a kernel task for setting up the command submission for you when all prerequisites are meet. nouveau has sort of a scheduler in hardware. It can yield when waiting on a semaphore. And each process gets their own context and the timeslices can be adjusted. ;-) But I don't mind changing this patch when an actual user pops up. Nouveau can do a wait for (*sema & mask) != 0 only on nvc0 and newer, where mask can be chosen. But it can do == somevalue and >= somevalue on older relevant optimus hardware, so if we know that it was zero before and we know the sign of the new value that could work too. Adding ops and a separate mask later on when users pop up is fine with me, the original design here was chosen so I could map the intel status page read-only into the process specific nvidia vm. ~Maarten

11 years, 11 months

2
1
0 0

How to manage cached dmabuf importer private data?

by Hiroshi Doyu

Hi, We have a problem about how to manage cached dmabuf importer private data, where to keep, how to reuse and how to clean up. We want to keep some data in dmabuf importer side until an buffer is free'ed actually since a buffer can be reused again later in that importer subsystem so that that cache data doesn't have to be regenerated. This can be considered as some kind of caching this data. The scenario is: (1) Exporter passes a dmabuf to Importer. (2) Importer attaches a dev to a dmabuf. (3) Importer generates some data for a buffer for its own use. (4) Importer finishes its use of a buffer. (5) Importer detaches a dev from a dmabuf. (6) Again, Exporter passes a dmabuf fd to the same Importer. (7) Again, Importer attaches a dev to a dmabuf. (8) Importer wants to use the previously cached data from (2) without regenerating. (9) Again, Importer detaches a dev from a dmabuf. (10) Exporter free's a buffer along with a cached data from (2)/(8). At first I considered to use attachmenet private data, but apparently a life time of the attachment isn't equal to one of a buffer. A buffer lives longer than an attachment. Also Neither private data from dmabuf nor from attachment are for /Importer/. They are for Exporter's use from the comment in the header file. /** * struct dma_buf - shared buffer object .... * @priv: exporter specific private data for this buffer object. */ /** * struct dma_buf_attachment - holds device-buffer attachment data ... * @priv: exporter specific attachment data. ... */ This leads to the following 2 questions: One question is how to clean up the cached data at (10) since there's no way for Importer to trigger clean up at that time. I am considering to embed an /notifier/ in dmabuf when it's called at dmabuf release. Importer could register any callback in that notifier. At least this requires a dmabuf to have an notifier to be called at release. Does this sound acceptable? Or can we do the same outside of dmabuf framework? If there's more appropriate way, please let me know since I'm not so familier with drm side yet. Another question is where to keep that cached data. Usually that data is only valid within Impoter subsystem. So Imoporter could keep the list of that data in it as a global list along with a dmabuf pointer. When a dmabuf is imported, Importer can look up a global list if it's already cached. This list needs to be kept till a buffer is free'ed. Those can be implemented in the dmabuf exporter backend but we want to allow multiple allocators/exporters to do the same, and I want to avoid having something related to importer in exporter side. Any comment would be really appreciated.

11 years, 11 months

2
1
0 0

[PATCH v5 00/11] reserved-memory regions/CMA in devicetree, again

by Marek Szyprowski

Hi all! Ok, I hope that this is the last update of the patches which add basic support for dynamic allocation of memory reserved regions defined in device tree. This time I've mainly sliced the main patch into several smaller pieces to make the changes easier to understand and fixes some minor coding style issues. The initial code for this feature were posted here [1], merged as commit 9d8eab7af79cb4ce2de5de39f82c455b1f796963 ("drivers: of: add initialization code for dma reserved memory") and later reverted by commit 1931ee143b0ab72924944bc06e363d837ba05063. For more information, see [2]. Finally a new bindings has been proposed [3] and Josh Cartwright a few days ago prepared some code which implements those bindings [4]. This finally pushed me again to find some time to finish this task and review the code. Josh agreed to give me the ownership of this series to continue preparing them for mainline inclusion. For more information please refer to the changlelog and links below. [1]: http://lkml.kernel.org/g/1377527959-5080-1-git-send-email-m.szyprowski@sams… [2]: http://lkml.kernel.org/g/1381476448-14548-1-git-send-email-m.szyprowski@sam… [3]: http://lkml.kernel.org/g/20131030134702.19B57C402A0@trevor.secretlab.ca [4]: http://thread.gmane.org/gmane.linux.documentation/19579 Changelog: v5: - sliced main patch into several smaller patches on Grant's request - fixed coding style issues pointed by Grant - use node->phandle value directly instead of parsing properties manually v4: https://lkml.org/lkml/2014/2/20/150 - dynamic allocations are processed after all static reservations has been done - moved code for handling static reservations to drivers/of/fdt.c - removed node matching by string comparison, now phandle values are used directly - moved code for DMA and CMA handling directly to drivers/base/dma-{coherent,contiguous}.c - added checks for proper #size-cells, #address-cells, ranges properties in /reserved-memory node - even more code cleanup - added init code for ARM64 and PowerPC v3: http://article.gmane.org/gmane.linux.documentation/20169/ - refactored memory reservation code, created common code to parse reg, size, align, alloc-ranges properties - added support for multiple tuples in 'reg' property - memory is reserved regardless of presence of the driver for its compatible - prepared arch specific hooks for memory reservation (defaults use memblock calls) - removed node matching by string during device initialization - CMA init code: added checks for required region alignment - more code cleanup here and there v2: http://thread.gmane.org/gmane.linux.documentation/19870/ - removed copying of the node name - split shared-dma-pool handling into separate files (one for CMA and one for dma_declare_coherent based implementations) for making the code easier to understand - added support for AMBA devices, changed prototypes to use struct decice instead of struct platform_device - renamed some functions to better match other names used in drivers/of/ - restructured the rest of the code a bit for better readability - added 'reusable' property to exmaple linux,cma node in documentation - exclusive dma (dma_coherent) is used for only handling 'shared-dma-pool' regions without 'reusable' property and CMA is used only for handling 'shared-dma-pool' regions with 'reusable' property. v1: http://thread.gmane.org/gmane.linux.documentation/19579 - initial version prepared by Josh Cartwright Summary: Grant Likely (1): of: document bindings for reserved-memory nodes Marek Szyprowski (10): drivers: of: add initialization code for static reserved memory drivers: of: add initialization code for dynamic reserved memory drivers: of: add support for custom reserved memory drivers drivers: of: add automated assignment of reserved regions to client devices drivers: of: initialize and assign reserved memory to newly created devices drivers: dma-coherent: add initialization from device tree drivers: dma-contiguous: add initialization from device tree arm: add support for reserved memory defined by device tree arm64: add support for reserved memory defined by device tree powerpc: add support for reserved memory defined by device tree .../bindings/reserved-memory/reserved-memory.txt | 138 ++++++++++ arch/arm/Kconfig | 1 + arch/arm/mm/init.c | 2 + arch/arm64/Kconfig | 1 + arch/arm64/mm/init.c | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/prom.c | 3 + drivers/base/dma-coherent.c | 41 +++ drivers/base/dma-contiguous.c | 130 +++++++-- drivers/of/Kconfig | 6 + drivers/of/Makefile | 1 + drivers/of/fdt.c | 134 +++++++++ drivers/of/of_reserved_mem.c | 291 ++++++++++++++++++++ drivers/of/platform.c | 7 + include/asm-generic/vmlinux.lds.h | 11 + include/linux/of_fdt.h | 3 + include/linux/of_reserved_mem.h | 61 ++++ 17 files changed, 810 insertions(+), 22 deletions(-) create mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt create mode 100644 drivers/of/of_reserved_mem.c create mode 100644 include/linux/of_reserved_mem.h -- 1.7.9.5

11 years, 11 months

3
20
0 0

Use of dma_buf_unmap_attachment in interrupt context?

by Hans Verkuil

A quick question: can dma_buf_unmap_attachment be called from interrupt context? It is the dmabuf equivalent to e.g. dma_sync_sg_for_cpu or dma_unmap_sg, and those can be called from interrupt context. I cannot see anything specific about this in the sources or dma-buf-sharing.txt. If it turns out that dma_buf_unmap_attachment can be called from atomic context, then that should be documented, I think. Regards, Hans

11 years, 11 months

1
0
0 0

CMA on AArch64

by Laura Abbott

Hi, I noticed there is currently no CMA support for AArch64. Is this already on someone's TODO list or is this still open? Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

11 years, 11 months

4
5
0 0

[PATCH 0/2] arm: dma-mapping: add dynamic resize of IOVA bitmap

by Marek Szyprowski

Hello, This patchset is a continuation of the work started by Andreas Herrmann to add support for dynamically resized bitmaps for IOMMU based DMA-mapping implementation for ARM architecture. Some more discussion and rationale has been discussed in the following thread: http://www.spinics.net/lists/arm-kernel/msg303732.html The first patch adds support for on-demand extending IO address space bitmap. It is based on the original work by Andreas Herrmann, but I decided to drop arm_iommu_create_mapping() api change part. The second patch removes the 'order' hack, which was used to reduce the size of a bitmap. The first patch solved the problem of too large io address space bitmaps, so the 'order' hack is no longer needed. The parameters of the arm_iommu_create_mapping() function can be then simplified by dropping 'order' parameter without any functional change of the whole subsystem. This parameter was already a bit misunderstood, so the overall result is also a little improvement of the API. Best regards Marek Szyprowski, PhD Samsung R&D Institute Poland Andreas Herrmann (1): arm: dma-mapping: Add support to extend DMA IOMMU mappings Marek Szyprowski (1): arm: dma-mapping: remove order parameter from arm_iommu_create_mapping() arch/arm/include/asm/dma-iommu.h | 12 ++- arch/arm/mm/dma-mapping.c | 144 +++++++++++++++++++++++------ drivers/gpu/drm/exynos/exynos_drm_drv.h | 2 - drivers/gpu/drm/exynos/exynos_drm_iommu.c | 6 +- drivers/gpu/drm/exynos/exynos_drm_iommu.h | 1 - drivers/iommu/shmobile-iommu.c | 2 +- 6 files changed, 124 insertions(+), 43 deletions(-) -- 1.7.9.5

11 years, 11 months

1
2
0 0

[PATCH v4 0/7] reserved-memory regions/CMA in devicetree, again

by Marek Szyprowski

Hi all! This is another quick update of the patches which add basic support for dynamic allocation of memory reserved regions defined in device tree. This time I hope I've really managed to address all the issues reported by Grant Likely, see the change log for more details. As a bonus, I've added support for ARM64 and PowerPC, as those architectures had quite well defined place where early memory reservation can be done. The initial code for this feature were posted here [1], merged as commit 9d8eab7af79cb4ce2de5de39f82c455b1f796963 ("drivers: of: add initialization code for dma reserved memory") and later reverted by commit 1931ee143b0ab72924944bc06e363d837ba05063. For more information, see [2]. Finally a new bindings has been proposed [3] and Josh Cartwright a few days ago prepared some code which implements those bindings [4]. This finally pushed me again to find some time to finish this task and review the code. Josh agreed to give me the ownership of this series to continue preparing them for mainline inclusion. For more information please refer to the changlelog below. [1]: http://lkml.kernel.org/g/1377527959-5080-1-git-send-email-m.szyprowski@sams… [2]: http://lkml.kernel.org/g/1381476448-14548-1-git-send-email-m.szyprowski@sam… [3]: http://lkml.kernel.org/g/20131030134702.19B57C402A0@trevor.secretlab.ca [4]: http://thread.gmane.org/gmane.linux.documentation/19579 Changelog: v4: - dynamic allocations are processed after all static reservations has been done - moved code for handling static reservations to drivers/of/fdt.c - removed node matching by string comparison, now phandle values are used directly - moved code for DMA and CMA handling directly to drivers/base/dma-{coherent,contiguous}.c - added checks for proper #size-cells, #address-cells, ranges properties in /reserved-memory node - even more code cleanup - added init code for ARM64 and PowerPC v3: http://article.gmane.org/gmane.linux.documentation/20169/ - refactored memory reservation code, created common code to parse reg, size, align, alloc-ranges properties - added support for multiple tuples in 'reg' property - memory is reserved regardless of presence of the driver for its compatible - prepared arch specific hooks for memory reservation (defaults use memblock calls) - removed node matching by string during device initialization - CMA init code: added checks for required region alignment - more code cleanup here and there v2: http://thread.gmane.org/gmane.linux.documentation/19870/ - removed copying of the node name - split shared-dma-pool handling into separate files (one for CMA and one for dma_declare_coherent based implementations) for making the code easier to understand - added support for AMBA devices, changed prototypes to use struct decice instead of struct platform_device - renamed some functions to better match other names used in drivers/of/ - restructured the rest of the code a bit for better readability - added 'reusable' property to exmaple linux,cma node in documentation - exclusive dma (dma_coherent) is used for only handling 'shared-dma-pool' regions without 'reusable' property and CMA is used only for handling 'shared-dma-pool' regions with 'reusable' property. v1: http://thread.gmane.org/gmane.linux.documentation/19579 - initial version prepared by Josh Cartwright Summary: Grant Likely (1): of: document bindings for reserved-memory nodes Marek Szyprowski (6): drivers: of: add initialization code for reserved memory drivers: dma-coherent: add initialization from device tree drivers: dma-contiguous: add initialization from device tree arm: add support for reserved memory defined by device tree arm64: add support for reserved memory defined by device tree powerpc: add support for reserved memory defined by device tree .../bindings/reserved-memory/reserved-memory.txt | 138 +++++++++ arch/arm/Kconfig | 1 + arch/arm/mm/init.c | 3 + arch/arm64/Kconfig | 1 + arch/arm64/mm/init.c | 2 + arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/prom.c | 3 + drivers/base/dma-coherent.c | 41 +++ drivers/base/dma-contiguous.c | 130 +++++++-- drivers/of/Kconfig | 6 + drivers/of/Makefile | 1 + drivers/of/fdt.c | 145 ++++++++++ drivers/of/of_reserved_mem.c | 296 ++++++++++++++++++++ drivers/of/platform.c | 7 + include/asm-generic/vmlinux.lds.h | 11 + include/linux/of_reserved_mem.h | 65 +++++ 16 files changed, 829 insertions(+), 22 deletions(-) create mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt create mode 100644 drivers/of/of_reserved_mem.c create mode 100644 include/linux/of_reserved_mem.h -- 1.7.9.5

11 years, 11 months

2
11
0 0

[PATCH v3 0/6] reserved-memory regions/CMA in devicetree, again

by Marek Szyprowski

Hi all! This is yet another update of the second attempt to add basic support for dynamic allocation of memory reserved regions defined in device tree. This time I've tried to address all the issues reported by Grant Likely. The side-effect of it is a complete rewrite of memory reservation code, which results in added support for for multiple tuples in 'reg' property and complete support for 'size', 'align' and 'alloc-ranges' properties. The initial code for this feature were posted here [1], merged as commit 9d8eab7af79cb4ce2de5de39f82c455b1f796963 ("drivers: of: add initialization code for dma reserved memory") and later reverted by commit 1931ee143b0ab72924944bc06e363d837ba05063. For more information, see [2]. Finally a new bindings has been proposed [3] and Josh Cartwright a few days ago prepared some code which implements those bindings [4]. This finally pushed me again to find some time to finish this task and review the code. Josh agreed to give me the ownership of this series to continue preparing them for mainline inclusion. For more information please refer to the changlelog below. [1]: http://lkml.kernel.org/g/1377527959-5080-1-git-send-email-m.szyprowski@sams… [2]: http://lkml.kernel.org/g/1381476448-14548-1-git-send-email-m.szyprowski@sam… [3]: http://lkml.kernel.org/g/20131030134702.19B57C402A0@trevor.secretlab.ca [4]: http://thread.gmane.org/gmane.linux.documentation/19579 Changelog: v3: - refactored memory reservation code, created common code to parse reg, size, align, alloc-ranges properties - added support for multiple tuples in 'reg' property - memory is reserved regardless of presence of the driver for its compatible - prepared arch specific hooks for memory reservation (defaults use memblock calls) - removed node matching by string during device initialization - CMA init code: added checks for required region alignment - more code cleanup here and there v2: http://thread.gmane.org/gmane.linux.documentation/19870/ - removed copying of the node name - split shared-dma-pool handling into separate files (one for CMA and one for dma_declare_coherent based implementations) for making the code easier to understand - added support for AMBA devices, changed prototypes to use struct decice instead of struct platform_device - renamed some functions to better match other names used in drivers/of/ - restructured the rest of the code a bit for better readability - added 'reusable' property to exmaple linux,cma node in documentation - exclusive dma (dma_coherent) is used for only handling 'shared-dma-pool' regions without 'reusable' property and CMA is used only for handling 'shared-dma-pool' regions with 'reusable' property. v1: http://thread.gmane.org/gmane.linux.documentation/19579 - initial version prepared by Josh Cartwright Summary: Grant Likely (1): of: document bindings for reserved-memory nodes Josh Cartwright (2): drivers: of: implement reserved-memory handling for dma drivers: of: implement reserved-memory handling for cma Marek Szyprowski (3): base: dma-contiguous: add dma_contiguous_init_reserved_mem() function drivers: of: add initialization code for reserved memory ARM: init: add support for reserved memory defined by device tree .../bindings/reserved-memory/reserved-memory.txt | 138 +++++++ arch/arm/Kconfig | 1 + arch/arm/mm/init.c | 3 + drivers/base/dma-contiguous.c | 70 ++-- drivers/of/Kconfig | 19 + drivers/of/Makefile | 3 + drivers/of/fdt.c | 2 + drivers/of/of_reserved_mem.c | 390 ++++++++++++++++++++ drivers/of/of_reserved_mem_cma.c | 68 ++++ drivers/of/of_reserved_mem_dma.c | 65 ++++ drivers/of/platform.c | 7 + include/asm-generic/vmlinux.lds.h | 11 + include/linux/dma-contiguous.h | 7 + include/linux/of_reserved_mem.h | 65 ++++ 14 files changed, 827 insertions(+), 22 deletions(-) create mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt create mode 100644 drivers/of/of_reserved_mem.c create mode 100644 drivers/of/of_reserved_mem_cma.c create mode 100644 drivers/of/of_reserved_mem_dma.c create mode 100644 include/linux/of_reserved_mem.h -- 1.7.9.5

11 years, 11 months

2
11
0 0

Re: [Linaro-mm-sig] [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)

by Rob Clark

On Mon, Feb 17, 2014 at 12:36 PM, Christian König <deathsimple(a)vodafone.de> wrote: > Am 17.02.2014 18:27, schrieb Rob Clark: > >> On Mon, Feb 17, 2014 at 11:56 AM, Christian König >> <deathsimple(a)vodafone.de> wrote: >>> >>> Am 17.02.2014 16:56, schrieb Maarten Lankhorst: >>> >>>> This type of fence can be used with hardware synchronization for simple >>>> hardware that can block execution until the condition >>>> (dma_buf[offset] - value) >= 0 has been met. >>> >>> >>> Can't we make that just "dma_buf[offset] != 0" instead? As far as I know >>> this way it would match the definition M$ uses in their WDDM >>> specification >>> and so make it much more likely that hardware supports it. >> >> well 'buf[offset] >= value' at least means the same slot can be used >> for multiple operations (with increasing values of 'value').. not sure >> if that is something people care about. >> >>> =value seems to be possible with adreno and radeon. I'm not really sure >>> about others (although I presume it as least supported for nv desktop >>> stuff). For hw that cannot do >=value, we can either have a different fence >>> implementation which uses the !=0 approach. Or change seqno-fence >>> implementation later if needed. But if someone has hw that can do !=0 but >>> not >=value, speak up now ;-) > > > Here! Radeon can only do >=value on the DMA and 3D engine, but not with UVD > or VCE. And for the 3D engine it means draining the pipe, which isn't really > a good idea. hmm, ok.. forgot you have a few extra rings compared to me. Is UVD re-ordering from decode-order to display-order for you in hw? If not, I guess you need sw intervention anyways when a frame is done for frame re-ordering, so maybe hw->hw sync doesn't really matter as much as compared to gpu/3d->display. For dma<->3d interactions, seems like you would care more about hw<->hw sync, but I guess you aren't likely to use GPU A to do a resolve blit for GPU B.. For 3D ring, I assume you probably want a CP_WAIT_FOR_IDLE before a CP_MEM_WRITE to update fence value in memory (for the one signalling the fence). But why would you need that before a CP_WAIT_REG_MEM (for the one waiting for the fence)? I don't exactly have documentation for adreno version of CP_WAIT_REG_{MEM,EQ,GTE}.. but PFP and ME appear to be same instruction set as r600, so I'm pretty sure they should have similar capabilities.. CP_WAIT_REG_MEM appears to be same but with 32bit gpu addresses vs 64b. BR, -R > Christian. > > >> >>> Apart from that I still don't like the idea of leaking a drivers IRQ >>> context >>> outside of the driver, but without a proper GPU scheduler there probably >>> isn't much alternative. >> >> I guess it will be not uncommon scenario for gpu device to just need >> to kick display device to write a few registers for a page flip.. >> probably best not to schedule a worker just for this (unless the >> signalled device otherwise needs to). I think it is better in this >> case to give the signalee some rope to hang themselves, and make it >> the responsibility of the callback to kick things off to a worker if >> needed. >> >> BR, >> -R >> >>> Christian. >>> >>>> A software fallback still has to be provided in case the fence is used >>>> with a device that doesn't support this mechanism. It is useful to >>>> expose >>>> this for graphics cards that have an op to support this. >>>> >>>> Some cards like i915 can export those, but don't have an option to wait, >>>> so they need the software fallback. >>>> >>>> I extended the original patch by Rob Clark. >>>> >>>> v1: Original >>>> v2: Renamed from bikeshed to seqno, moved into dma-fence.c since >>>> not much was left of the file. Lots of documentation added. >>>> v3: Use fence_ops instead of custom callbacks. Moved to own file >>>> to avoid circular dependency between dma-buf.h and fence.h >>>> v4: Add spinlock pointer to seqno_fence_init >>>> >>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)canonical.com> >>>> --- >>>> Documentation/DocBook/device-drivers.tmpl | 1 >>>> drivers/base/fence.c | 50 +++++++++++++ >>>> include/linux/seqno-fence.h | 109 >>>> +++++++++++++++++++++++++++++ >>>> 3 files changed, 160 insertions(+) >>>> create mode 100644 include/linux/seqno-fence.h >>>> >>>> diff --git a/Documentation/DocBook/device-drivers.tmpl >>>> b/Documentation/DocBook/device-drivers.tmpl >>>> index 7a0c9ddb4818..8c85c20942c2 100644 >>>> --- a/Documentation/DocBook/device-drivers.tmpl >>>> +++ b/Documentation/DocBook/device-drivers.tmpl >>>> @@ -131,6 +131,7 @@ X!Edrivers/base/interface.c >>>> !Edrivers/base/dma-buf.c >>>> !Edrivers/base/fence.c >>>> !Iinclude/linux/fence.h >>>> +!Iinclude/linux/seqno-fence.h >>>> !Edrivers/base/reservation.c >>>> !Iinclude/linux/reservation.h >>>> !Edrivers/base/dma-coherent.c >>>> diff --git a/drivers/base/fence.c b/drivers/base/fence.c >>>> index 12df2bf62034..cd0937127a89 100644 >>>> --- a/drivers/base/fence.c >>>> +++ b/drivers/base/fence.c >>>> @@ -25,6 +25,7 @@ >>>> #include <linux/export.h> >>>> #include <linux/atomic.h> >>>> #include <linux/fence.h> >>>> +#include <linux/seqno-fence.h> >>>> #define CREATE_TRACE_POINTS >>>> #include <trace/events/fence.h> >>>> @@ -413,3 +414,52 @@ __fence_init(struct fence *fence, const struct >>>> fence_ops *ops, >>>> trace_fence_init(fence); >>>> } >>>> EXPORT_SYMBOL(__fence_init); >>>> + >>>> +static const char *seqno_fence_get_driver_name(struct fence *fence) { >>>> + struct seqno_fence *seqno_fence = to_seqno_fence(fence); >>>> + return seqno_fence->ops->get_driver_name(fence); >>>> +} >>>> + >>>> +static const char *seqno_fence_get_timeline_name(struct fence *fence) { >>>> + struct seqno_fence *seqno_fence = to_seqno_fence(fence); >>>> + return seqno_fence->ops->get_timeline_name(fence); >>>> +} >>>> + >>>> +static bool seqno_enable_signaling(struct fence *fence) >>>> +{ >>>> + struct seqno_fence *seqno_fence = to_seqno_fence(fence); >>>> + return seqno_fence->ops->enable_signaling(fence); >>>> +} >>>> + >>>> +static bool seqno_signaled(struct fence *fence) >>>> +{ >>>> + struct seqno_fence *seqno_fence = to_seqno_fence(fence); >>>> + return seqno_fence->ops->signaled && >>>> seqno_fence->ops->signaled(fence); >>>> +} >>>> + >>>> +static void seqno_release(struct fence *fence) >>>> +{ >>>> + struct seqno_fence *f = to_seqno_fence(fence); >>>> + >>>> + dma_buf_put(f->sync_buf); >>>> + if (f->ops->release) >>>> + f->ops->release(fence); >>>> + else >>>> + kfree(f); >>>> +} >>>> + >>>> +static long seqno_wait(struct fence *fence, bool intr, signed long >>>> timeout) >>>> +{ >>>> + struct seqno_fence *f = to_seqno_fence(fence); >>>> + return f->ops->wait(fence, intr, timeout); >>>> +} >>>> + >>>> +const struct fence_ops seqno_fence_ops = { >>>> + .get_driver_name = seqno_fence_get_driver_name, >>>> + .get_timeline_name = seqno_fence_get_timeline_name, >>>> + .enable_signaling = seqno_enable_signaling, >>>> + .signaled = seqno_signaled, >>>> + .wait = seqno_wait, >>>> + .release = seqno_release, >>>> +}; >>>> +EXPORT_SYMBOL(seqno_fence_ops); >>>> diff --git a/include/linux/seqno-fence.h b/include/linux/seqno-fence.h >>>> new file mode 100644 >>>> index 000000000000..952f7909128c >>>> --- /dev/null >>>> +++ b/include/linux/seqno-fence.h >>>> @@ -0,0 +1,109 @@ >>>> +/* >>>> + * seqno-fence, using a dma-buf to synchronize fencing >>>> + * >>>> + * Copyright (C) 2012 Texas Instruments >>>> + * Copyright (C) 2012 Canonical Ltd >>>> + * Authors: >>>> + * Rob Clark <robdclark(a)gmail.com> >>>> + * Maarten Lankhorst <maarten.lankhorst(a)canonical.com> >>>> + * >>>> + * This program is free software; you can redistribute it and/or modify >>>> it >>>> + * under the terms of the GNU General Public License version 2 as >>>> published by >>>> + * the Free Software Foundation. >>>> + * >>>> + * This program is distributed in the hope that it will be useful, but >>>> WITHOUT >>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY >>>> or >>>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public >>>> License >>>> for >>>> + * more details. >>>> + * >>>> + * You should have received a copy of the GNU General Public License >>>> along with >>>> + * this program. If not, see <http://www.gnu.org/licenses/>. >>>> + */ >>>> + >>>> +#ifndef __LINUX_SEQNO_FENCE_H >>>> +#define __LINUX_SEQNO_FENCE_H >>>> + >>>> +#include <linux/fence.h> >>>> +#include <linux/dma-buf.h> >>>> + >>>> +struct seqno_fence { >>>> + struct fence base; >>>> + >>>> + const struct fence_ops *ops; >>>> + struct dma_buf *sync_buf; >>>> + uint32_t seqno_ofs; >>>> +}; >>>> + >>>> +extern const struct fence_ops seqno_fence_ops; >>>> + >>>> +/** >>>> + * to_seqno_fence - cast a fence to a seqno_fence >>>> + * @fence: fence to cast to a seqno_fence >>>> + * >>>> + * Returns NULL if the fence is not a seqno_fence, >>>> + * or the seqno_fence otherwise. >>>> + */ >>>> +static inline struct seqno_fence * >>>> +to_seqno_fence(struct fence *fence) >>>> +{ >>>> + if (fence->ops != &seqno_fence_ops) >>>> + return NULL; >>>> + return container_of(fence, struct seqno_fence, base); >>>> +} >>>> + >>>> +/** >>>> + * seqno_fence_init - initialize a seqno fence >>>> + * @fence: seqno_fence to initialize >>>> + * @lock: pointer to spinlock to use for fence >>>> + * @sync_buf: buffer containing the memory location to signal on >>>> + * @context: the execution context this fence is a part of >>>> + * @seqno_ofs: the offset within @sync_buf >>>> + * @seqno: the sequence # to signal on >>>> + * @ops: the fence_ops for operations on this seqno fence >>>> + * >>>> + * This function initializes a struct seqno_fence with passed >>>> parameters, >>>> + * and takes a reference on sync_buf which is released on fence >>>> destruction. >>>> + * >>>> + * A seqno_fence is a dma_fence which can complete in software when >>>> + * enable_signaling is called, but it also completes when >>>> + * (s32)((sync_buf)[seqno_ofs] - seqno) >= 0 is true >>>> + * >>>> + * The seqno_fence will take a refcount on the sync_buf until it's >>>> + * destroyed, but actual lifetime of sync_buf may be longer if one of >>>> the >>>> + * callers take a reference to it. >>>> + * >>>> + * Certain hardware have instructions to insert this type of wait >>>> condition >>>> + * in the command stream, so no intervention from software would be >>>> needed. >>>> + * This type of fence can be destroyed before completed, however a >>>> reference >>>> + * on the sync_buf dma-buf can be taken. It is encouraged to re-use the >>>> same >>>> + * dma-buf for sync_buf, since mapping or unmapping the sync_buf to the >>>> + * device's vm can be expensive. >>>> + * >>>> + * It is recommended for creators of seqno_fence to call fence_signal >>>> + * before destruction. This will prevent possible issues from >>>> wraparound >>>> at >>>> + * time of issue vs time of check, since users can check >>>> fence_is_signaled >>>> + * before submitting instructions for the hardware to wait on the >>>> fence. >>>> + * However, when ops.enable_signaling is not called, it doesn't have to >>>> be >>>> + * done as soon as possible, just before there's any real danger of >>>> seqno >>>> + * wraparound. >>>> + */ >>>> +static inline void >>>> +seqno_fence_init(struct seqno_fence *fence, spinlock_t *lock, >>>> + struct dma_buf *sync_buf, uint32_t context, uint32_t >>>> seqno_ofs, >>>> + uint32_t seqno, const struct fence_ops *ops) >>>> +{ >>>> + BUG_ON(!fence || !sync_buf || !ops); >>>> + BUG_ON(!ops->wait || !ops->enable_signaling || >>>> !ops->get_driver_name || !ops->get_timeline_name); >>>> + >>>> + /* >>>> + * ops is used in __fence_init for get_driver_name, so needs to >>>> be >>>> + * initialized first >>>> + */ >>>> + fence->ops = ops; >>>> + __fence_init(&fence->base, &seqno_fence_ops, lock, context, >>>> seqno); >>>> + get_dma_buf(sync_buf); >>>> + fence->sync_buf = sync_buf; >>>> + fence->seqno_ofs = seqno_ofs; >>>> +} >>>> + >>>> +#endif /* __LINUX_SEQNO_FENCE_H */ >>>> >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel(a)lists.freedesktop.org >>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel(a)lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/dri-devel > >

11 years, 12 months

1
0
0 0

[PATCH v2 0/5] reserved-memory regions/CMA in devicetree, again

by Marek Szyprowski

Hi all! This is an updated version of the second attempt to add basic support for dynamic allocation of memory reserved regions defined in device tree. The initial code for this feature were posted here [1], merged as commit 9d8eab7af79cb4ce2de5de39f82c455b1f796963 ("drivers: of: add initialization code for dma reserved memory") and later reverted by commit 1931ee143b0ab72924944bc06e363d837ba05063. For more information, see [2]. Finally a new bindings has been proposed [3] and Josh Cartwright a few days ago prepared some code which implements those bindings [4]. This finally pushed me again to find some time to finish this task and review the code. Josh agreed to give me the ownership of this series to continue preparing them for mainline inclusion. For more information please refer to the changlelog below. [1]: http://lkml.kernel.org/g/1377527959-5080-1-git-send-email-m.szyprowski@sams… [2]: http://lkml.kernel.org/g/1381476448-14548-1-git-send-email-m.szyprowski@sam… [3]: http://lkml.kernel.org/g/20131030134702.19B57C402A0@trevor.secretlab.ca [4]: http://thread.gmane.org/gmane.linux.documentation/19579 Changelog: v2: - removed copying of the node name - split shared-dma-pool handling into separate files (one for CMA and one for dma_declare_coherent based implementations) for making the code easier to understand - added support for AMBA devices, changed prototypes to use struct decice instead of struct platform_device - renamed some functions to better match other names used in drivers/of/ - restructured the rest of the code a bit for better readability - added 'reusable' property to exmaple linux,cma node in documentation - exclusive dma (dma_coherent) is used for only handling 'shared-dma-pool' regions without 'reusable' property and CMA is used only for handling 'shared-dma-pool' regions with 'reusable' property. v1: http://thread.gmane.org/gmane.linux.documentation/19579 - initial version prepared by Josh Cartwright Summary: Grant Likely (1): of: document bindings for reserved-memory nodes Josh Cartwright (2): drivers: of: implement reserved-memory handling for dma drivers: of: implement reserved-memory handling for cma Marek Szyprowski (2): drivers: of: add initialization code for reserved memory ARM: init: add support for reserved memory defined by device tree .../bindings/reserved-memory/reserved-memory.txt | 138 ++++++++++++ arch/arm/mm/init.c | 3 + drivers/of/Kconfig | 20 ++ drivers/of/Makefile | 3 + drivers/of/of_reserved_mem.c | 219 ++++++++++++++++++++ drivers/of/of_reserved_mem_cma.c | 75 +++++++ drivers/of/of_reserved_mem_dma.c | 78 +++++++ drivers/of/platform.c | 7 + include/asm-generic/vmlinux.lds.h | 11 + include/linux/of_reserved_mem.h | 62 ++++++ 10 files changed, 616 insertions(+) create mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt create mode 100644 drivers/of/of_reserved_mem.c create mode 100644 drivers/of/of_reserved_mem_cma.c create mode 100644 drivers/of/of_reserved_mem_dma.c create mode 100644 include/linux/of_reserved_mem.h -- 1.7.9.5

11 years, 12 months

7
26
0 0

[GIT PULL]: dma-buf updates for 3.14

by Sumit Semwal

Hi Linus, Here's another tiny pull request for dma-buf framework updates; just some debugfs output updates. (There's another patch related to dma-buf, but it'll get upstreamed via Greg-kh's pull request). Could you please pull? The following changes since commit 45f7fdc2ffb9d5af4dab593843e89da70d1259e3: Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc (2014-02-11 22:28:47 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf.git tags/dma-buf-for-3.14 for you to fetch changes up to c0b00a525c127d0055c1df6283300e17f601a1a1: dma-buf: update debugfs output (2014-02-13 10:08:52 +0530) ---------------------------------------------------------------- Small dma-buf pull request for 3.14 ---------------------------------------------------------------- Sumit Semwal (1): dma-buf: update debugfs output drivers/base/dma-buf.c | 25 ++++++++++++------------- include/linux/dma-buf.h | 2 +- 2 files changed, 13 insertions(+), 14 deletions(-)

11 years, 12 months

1
0
0 0

KSM and Android

by Pradeep Sawlani

Hello, In pursuit of saving memory on Android, I started experimenting with Kernel Same Page Merging(KSM). Number of pages shared because of KSM is reported by /sys/kernel/mm/pages_sharing. Documentation/vm/ksm.txt explains this as: "pages_sharing - how many more sites are sharing them i.e. how much saved" After enabling KSM on Android device, this number was reported as 19666 pages. Obvious optimization is to find out source of sharing and see if we can avoid duplicate pages at first place. In order to collect the data needed, It needed few modifications(trace_printk) statement in mm/ksm.c. Data should be collected from second cycle because that's when ksm starts merging pages. First KSM cycle is only used to calculate the checksum, pages are added to unstable tree and eventually moved to stable tree after this. After analyzing data from second KSM cycle, few things which stood out: 1. In the same cycle, KSM can scan same page multiple times. Scanning a page involves comparing page with pages in stable tree, if no match is found checksum is calculated. From the look of it, it seems to be cpu intensive operation and impacts dcache as well. 2. Same page which is already shared by multiple process can be replaced by KSM page. In this case, let say a particular page is mapped 24 times and is replaced by KSM page then eventually all 24 entries will point to KSM page. pages_sharing will account for all 24 pages. so pages _sharing does not actually report amount of memory saved.From the above example actual savings is one page. Both cases happen very often with Android because of its architecture - Zygote spawning(fork) multipleapplications. To calculate actual savings, we should account for same page(pfn)replaced by same KSM page only once. In the case 2 example, page_sharing should account only one page.After recalculating memory saving comes out to be 8602 pages (~34MB). I am trying to find out right solution to fix pages_sharing and eventually optimize KSM to scan pageonce even if it is mapped multiple times. Comments? Has anyone tried this before? Thanks, Pradeep

11 years, 12 months

2
2
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig