- Linaro-mm-sig - lists.linaro.org

Re: [Linaro-mm-sig] [PATCH net-next 2/5] dt-bindings: net: brcm, unimac-mdio: Add asp-v2.0

by Rob Herring

On Fri, 24 Sep 2021 14:44:48 -0700, Justin Chen wrote: > The ASP 2.0 Ethernet controller uses a brcm unimac. > > Signed-off-by: Justin Chen <justinpopo6(a)gmail.com> > Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com> > --- > Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml | 1 + > 1 file changed, 1 insertion(+) > Acked-by: Rob Herring <robh(a)kernel.org>

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v6 2/2] habanalabs: add support for dma-buf exporter

by Jason Gunthorpe

On Thu, Sep 30, 2021 at 03:46:35PM +0300, Oded Gabbay wrote: > After reading the kernel iommu code, I think this is not relevant > here, and I'll add a comment appropriately but I'll also write it > here, and please correct me if my understanding is wrong. > > The memory behind this specific dma-buf has *always* resided on the > device itself, i.e. it lives only in the 'device' domain (after all, > it maps a PCI bar address which points to the device memory). > Therefore, it was never in the 'CPU' domain and hence, there is no > need to perform a sync of the memory to the CPU's cache, as it was > never inside that cache to begin with. > > This is not the same case as with regular memory which is dma-mapped > and then copied into the device using a dma engine. In that case, > the memory started in the 'CPU' domain and moved to the 'device' > domain. When it is unmapped it will indeed be recycled to be used > for another purpose and therefore we need to sync the CPU cache. > > Is my understanding correct ? It makes sense to me Jason

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v6 2/2] habanalabs: add support for dma-buf exporter

by Jason Gunthorpe

On Wed, Sep 29, 2021 at 12:17:35AM +0300, Oded Gabbay wrote: > On Tue, Sep 28, 2021 at 8:36 PM Jason Gunthorpe <jgg(a)ziepe.ca> wrote: > > > > On Sun, Sep 12, 2021 at 07:53:09PM +0300, Oded Gabbay wrote: > > > From: Tomer Tayar <ttayar(a)habana.ai> > > > > > > Implement the calls to the dma-buf kernel api to create a dma-buf > > > object backed by FD. > > > > > > We block the option to mmap the DMA-BUF object because we don't support > > > DIRECT_IO and implicit P2P. > > > > This statement doesn't make sense, you can mmap your dmabuf if you > > like. All dmabuf mmaps are supposed to set the special bit/etc to > > exclude them from get_user_pages() anyhow - and since this is BAR > > memory not struct page memory this driver would be doing it anyhow. > > > But we block mmap the dmabuf fd from user-space. > If you try to do it, you will get MAP_FAILED. You do, I'm saying the above paragraph explaining *why* that was done is not correct. > > > We check the p2p distance using pci_p2pdma_distance_many() and refusing > > > to map dmabuf in case the distance doesn't allow p2p. > > > > Does this actually allow the p2p transfer for your intended use cases? > > It depends on the system. If we are working bare-metal, then yes, it allows. > If inside a VM, then no. The virtualized root complex is not > white-listed and the kernel can't know the distance. > But I remember you asked me to add this check, in v3 of the review IIRC. > I don't mind removing this check if you don't object. Yes, i tis the right code, I was curious how far along things have gotten > > Don't write to the kernel log from user space triggered actions > at all ? At all. > It's the first time I hear about this limitation... Oh? It is a security issue, we don't want to allow userspace to DOS the kerne logging. > How do you tell the user it has done something wrong ? dev_dbg is the usual way and then users doing debugging can opt in to the logging. > > Why doesn't this return a sg_table * and an ERR_PTR? > Basically I modeled this function after amdgpu_vram_mgr_alloc_sgt() > And in that function they also return int and pass the sg_table as ** > > If it's critical I can change. Please follow the normal kernel style Jason

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [RFC PATCH v2 2/2] RDMA/rxe: Add dma-buf support

by Daniel Vetter

On Wed, Sep 29, 2021 at 01:19:05PM +0900, Shunsuke Mie wrote: > Implement a ib device operation ‘reg_user_mr_dmabuf’. Generate a > rxe_map from the memory space linked the passed dma-buf. > > Signed-off-by: Shunsuke Mie <mie(a)igel.co.jp> > --- > drivers/infiniband/sw/rxe/rxe_loc.h | 2 + > drivers/infiniband/sw/rxe/rxe_mr.c | 118 ++++++++++++++++++++++++++ > drivers/infiniband/sw/rxe/rxe_verbs.c | 34 ++++++++ > drivers/infiniband/sw/rxe/rxe_verbs.h | 2 + > 4 files changed, 156 insertions(+) > > diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h > index 1ca43b859d80..8bc19ea1a376 100644 > --- a/drivers/infiniband/sw/rxe/rxe_loc.h > +++ b/drivers/infiniband/sw/rxe/rxe_loc.h > @@ -75,6 +75,8 @@ u8 rxe_get_next_key(u32 last_key); > void rxe_mr_init_dma(struct rxe_pd *pd, int access, struct rxe_mr *mr); > int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova, > int access, struct rxe_mr *mr); > +int rxe_mr_dmabuf_init_user(struct rxe_pd *pd, int fd, u64 start, u64 length, > + u64 iova, int access, struct rxe_mr *mr); > int rxe_mr_init_fast(struct rxe_pd *pd, int max_pages, struct rxe_mr *mr); > int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, > enum rxe_mr_copy_dir dir); > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c > index 53271df10e47..af6ef671c3a5 100644 > --- a/drivers/infiniband/sw/rxe/rxe_mr.c > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c > @@ -4,6 +4,7 @@ > * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved. > */ > > +#include <linux/dma-buf.h> > #include "rxe.h" > #include "rxe_loc.h" > > @@ -245,6 +246,120 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova, > return err; > } > > +static int rxe_map_dmabuf_mr(struct rxe_mr *mr, > + struct ib_umem_dmabuf *umem_dmabuf) > +{ > + struct rxe_map_set *set; > + struct rxe_phys_buf *buf = NULL; > + struct rxe_map **map; > + void *vaddr, *vaddr_end; > + int num_buf = 0; > + int err; > + size_t remain; > + > + mr->dmabuf_map = kzalloc(sizeof &mr->dmabuf_map, GFP_KERNEL); dmabuf_maps are just tagged pointers (and we could shrink them to actually just a tagged pointer if anyone cares about the overhead of the separate bool), allocating them seperately is overkill. > + if (!mr->dmabuf_map) { > + err = -ENOMEM; > + goto err_out; > + } > + > + err = dma_buf_vmap(umem_dmabuf->dmabuf, mr->dmabuf_map); > + if (err) > + goto err_free_dmabuf_map; > + > + set = mr->cur_map_set; > + set->page_shift = PAGE_SHIFT; > + set->page_mask = PAGE_SIZE - 1; > + > + map = set->map; > + buf = map[0]->buf; > + > + vaddr = mr->dmabuf_map->vaddr; dma_buf_map can be an __iomem too, you shouldn't dig around in this, but use the dma-buf-map.h helpers instead. On x86 (and I think also on most arm) it doesn't matter, but it's kinda not very nice in a pure software driver. If anything is missing in dma-buf-map.h wrappers just add more. Or alternatively you need to fail the import if you can't handle __iomem. Aside from these I think the dma-buf side here for cpu access looks reasonable now. -Daniel > + vaddr_end = vaddr + umem_dmabuf->dmabuf->size; > + remain = umem_dmabuf->dmabuf->size; > + > + for (; remain; vaddr += PAGE_SIZE) { > + if (num_buf >= RXE_BUF_PER_MAP) { > + map++; > + buf = map[0]->buf; > + num_buf = 0; > + } > + > + buf->addr = (uintptr_t)vaddr; > + if (remain >= PAGE_SIZE) > + buf->size = PAGE_SIZE; > + else > + buf->size = remain; > + remain -= buf->size; > + > + num_buf++; > + buf++; > + } > + > + return 0; > + > +err_free_dmabuf_map: > + kfree(mr->dmabuf_map); > +err_out: > + return err; > +} > + > +static void rxe_unmap_dmabuf_mr(struct rxe_mr *mr) > +{ > + struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem); > + > + dma_buf_vunmap(umem_dmabuf->dmabuf, mr->dmabuf_map); > + kfree(mr->dmabuf_map); > +} > + > +int rxe_mr_dmabuf_init_user(struct rxe_pd *pd, int fd, u64 start, u64 length, > + u64 iova, int access, struct rxe_mr *mr) > +{ > + struct ib_umem_dmabuf *umem_dmabuf; > + struct rxe_map_set *set; > + int err; > + > + umem_dmabuf = ib_umem_dmabuf_get(pd->ibpd.device, start, length, fd, > + access, NULL); > + if (IS_ERR(umem_dmabuf)) { > + err = PTR_ERR(umem_dmabuf); > + goto err_out; > + } > + > + rxe_mr_init(access, mr); > + > + err = rxe_mr_alloc(mr, ib_umem_num_pages(&umem_dmabuf->umem), 0); > + if (err) { > + pr_warn("%s: Unable to allocate memory for map\n", __func__); > + goto err_release_umem; > + } > + > + mr->ibmr.pd = &pd->ibpd; > + mr->umem = &umem_dmabuf->umem; > + mr->access = access; > + mr->state = RXE_MR_STATE_VALID; > + mr->type = IB_MR_TYPE_USER; > + > + set = mr->cur_map_set; > + set->length = length; > + set->iova = iova; > + set->va = start; > + set->offset = ib_umem_offset(mr->umem); > + > + err = rxe_map_dmabuf_mr(mr, umem_dmabuf); > + if (err) > + goto err_free_map_set; > + > + return 0; > + > +err_free_map_set: > + rxe_mr_free_map_set(mr->num_map, mr->cur_map_set); > +err_release_umem: > + ib_umem_release(&umem_dmabuf->umem); > +err_out: > + return err; > +} > + > int rxe_mr_init_fast(struct rxe_pd *pd, int max_pages, struct rxe_mr *mr) > { > int err; > @@ -703,6 +818,9 @@ void rxe_mr_cleanup(struct rxe_pool_entry *arg) > { > struct rxe_mr *mr = container_of(arg, typeof(*mr), pelem); > > + if (mr->umem && mr->umem->is_dmabuf) > + rxe_unmap_dmabuf_mr(mr); > + > ib_umem_release(mr->umem); > > if (mr->cur_map_set) > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c > index 9d0bb9aa7514..6191bb4f434d 100644 > --- a/drivers/infiniband/sw/rxe/rxe_verbs.c > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c > @@ -916,6 +916,39 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, > return ERR_PTR(err); > } > > +static struct ib_mr *rxe_reg_user_mr_dmabuf(struct ib_pd *ibpd, u64 start, > + u64 length, u64 iova, int fd, > + int access, struct ib_udata *udata) > +{ > + int err; > + struct rxe_dev *rxe = to_rdev(ibpd->device); > + struct rxe_pd *pd = to_rpd(ibpd); > + struct rxe_mr *mr; > + > + mr = rxe_alloc(&rxe->mr_pool); > + if (!mr) { > + err = -ENOMEM; > + goto err2; > + } > + > + rxe_add_index(mr); > + > + rxe_add_ref(pd); > + > + err = rxe_mr_dmabuf_init_user(pd, fd, start, length, iova, access, mr); > + if (err) > + goto err3; > + > + return &mr->ibmr; > + > +err3: > + rxe_drop_ref(pd); > + rxe_drop_index(mr); > + rxe_drop_ref(mr); > +err2: > + return ERR_PTR(err); > +} > + > static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, > u32 max_num_sg) > { > @@ -1081,6 +1114,7 @@ static const struct ib_device_ops rxe_dev_ops = { > .query_qp = rxe_query_qp, > .query_srq = rxe_query_srq, > .reg_user_mr = rxe_reg_user_mr, > + .reg_user_mr_dmabuf = rxe_reg_user_mr_dmabuf, > .req_notify_cq = rxe_req_notify_cq, > .resize_cq = rxe_resize_cq, > > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h > index c807639435eb..0aa95ab06b6e 100644 > --- a/drivers/infiniband/sw/rxe/rxe_verbs.h > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h > @@ -334,6 +334,8 @@ struct rxe_mr { > > struct rxe_map_set *cur_map_set; > struct rxe_map_set *next_map_set; > + > + struct dma_buf_map *dmabuf_map; > }; > > enum rxe_mw_state { > -- > 2.17.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v6 0/2] Add p2p via dmabuf to habanalabs

by Daniel Vetter

On Tue, Sep 28, 2021 at 10:04:29AM +0300, Oded Gabbay wrote: > On Thu, Sep 23, 2021 at 12:22 PM Oded Gabbay <ogabbay(a)kernel.org> wrote: > > > > On Sat, Sep 18, 2021 at 11:38 AM Oded Gabbay <ogabbay(a)kernel.org> wrote: > > > > > > On Fri, Sep 17, 2021 at 3:30 PM Daniel Vetter <daniel(a)ffwll.ch> wrote: > > > > > > > > On Thu, Sep 16, 2021 at 10:10:14AM -0300, Jason Gunthorpe wrote: > > > > > On Thu, Sep 16, 2021 at 02:31:34PM +0200, Daniel Vetter wrote: > > > > > > On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote: > > > > > > > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg(a)ziepe.ca> wrote: > > > > > > > > > > > > > > > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote: > > > > > > > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote: > > > > > > > > > > Hi, > > > > > > > > > > Re-sending this patch-set following the release of our user-space TPC > > > > > > > > > > compiler and runtime library. > > > > > > > > > > > > > > > > > > > > I would appreciate a review on this. > > > > > > > > > > > > > > > > > > I think the big open we have is the entire revoke discussions. Having the > > > > > > > > > option to let dma-buf hang around which map to random local memory ranges, > > > > > > > > > without clear ownership link and a way to kill it sounds bad to me. > > > > > > > > > > > > > > > > > > I think there's a few options: > > > > > > > > > - We require revoke support. But I've heard rdma really doesn't like that, > > > > > > > > > I guess because taking out an MR while holding the dma_resv_lock would > > > > > > > > > be an inversion, so can't be done. Jason, can you recap what exactly the > > > > > > > > > hold-up was again that makes this a no-go? > > > > > > > > > > > > > > > > RDMA HW can't do revoke. > > > > > > > > > > > > Like why? I'm assuming when the final open handle or whatever for that MR > > > > > > is closed, you do clean up everything? Or does that MR still stick around > > > > > > forever too? > > > > > > > > > > It is a combination of uAPI and HW specification. > > > > > > > > > > revoke here means you take a MR object and tell it to stop doing DMA > > > > > without causing the MR object to be destructed. > > > > > > > > > > All the drivers can of course destruct the MR, but doing such a > > > > > destruction without explicit synchronization with user space opens > > > > > things up to a serious use-after potential that could be a security > > > > > issue. > > > > > > > > > > When the open handle closes the userspace is synchronized with the > > > > > kernel and we can destruct the HW objects safely. > > > > > > > > > > So, the special HW feature required is 'stop doing DMA but keep the > > > > > object in an error state' which isn't really implemented, and doesn't > > > > > extend very well to other object types beyond simple MRs. > > > > > > > > Yeah revoke without destroying the MR doesn't work, and it sounds like > > > > revoke by destroying the MR just moves the can of worms around to another > > > > place. > > > > > > > > > > 1. User A opens gaudi device, sets up dma-buf export > > > > > > > > > > > > 2. User A registers that with RDMA, or anything else that doesn't support > > > > > > revoke. > > > > > > > > > > > > 3. User A closes gaudi device > > > > > > > > > > > > 4. User B opens gaudi device, assumes that it has full control over the > > > > > > device and uploads some secrets, which happen to end up in the dma-buf > > > > > > region user A set up > > > > > > > > > > I would expect this is blocked so long as the DMABUF exists - eg the > > > > > DMABUF will hold a fget on the FD of #1 until the DMABUF is closed, so > > > > > that #3 can't actually happen. > > > > > > > > > > > It's not mlocked memory, it's mlocked memory and I can exfiltrate > > > > > > it. > > > > > > > > > > That's just bug, don't make buggy drivers :) > > > > > > > > Well yeah, but given that habanalabs hand rolled this I can't just check > > > > for the usual things we have to enforce this in drm. And generally you can > > > > just open chardevs arbitrarily, and multiple users fighting over each > > > > another. The troubles only start when you have private state or memory > > > > allocations of some kind attached to the struct file (instead of the > > > > underlying device), or something else that requires device exclusivity. > > > > There's no standard way to do that. > > > > > > > > Plus in many cases you really want revoke on top (can't get that here > > > > unfortunately it seems), and the attempts to get towards a generic > > > > revoke() just never went anywhere. So again it's all hand-rolled > > > > per-subsystem. *insert lament about us not having done this through a > > > > proper subsystem* > > > > > > > > Anyway it sounds like the code takes care of that. > > > > -Daniel > > > > > > Daniel, Jason, > > > Thanks for reviewing this code. > > > > > > Can I get an R-B / A-B from you for this patch-set ? > > > > > > Thanks, > > > Oded > > > > A kind reminder. > > > > Thanks, > > Oded > > Hi, > I know last week was LPC and maybe this got lost in the inbox, so I'm > sending it again to make sure you got my request for R-B / A-B. I was waiting for some clarity from the maintainers summit, but that's still about as unclear as it gets. Either way technically it sounds ok, but I'm a bit burried so didn't look at the code. Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch> But looking beyond the strict lens of dma-buf I'm still impressed by the mess this created, to get to the same endpoint of "we open our stack" in the same time it takes others to sort this out. I'm still looking for some kind of plan to fix this. Also you probably want to get Dave to ack this too, I pinged him on irc last week about this after maintainer summit. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v6 2/2] habanalabs: add support for dma-buf exporter

by Jason Gunthorpe

On Sun, Sep 12, 2021 at 07:53:09PM +0300, Oded Gabbay wrote: > From: Tomer Tayar <ttayar(a)habana.ai> > > Implement the calls to the dma-buf kernel api to create a dma-buf > object backed by FD. > > We block the option to mmap the DMA-BUF object because we don't support > DIRECT_IO and implicit P2P. This statement doesn't make sense, you can mmap your dmabuf if you like. All dmabuf mmaps are supposed to set the special bit/etc to exclude them from get_user_pages() anyhow - and since this is BAR memory not struct page memory this driver would be doing it anyhow. > We check the p2p distance using pci_p2pdma_distance_many() and refusing > to map dmabuf in case the distance doesn't allow p2p. Does this actually allow the p2p transfer for your intended use cases? > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c > index 33986933aa9e..8cf5437c0390 100644 > +++ b/drivers/misc/habanalabs/common/memory.c > @@ -1,7 +1,7 @@ > // SPDX-License-Identifier: GPL-2.0 > > /* > - * Copyright 2016-2019 HabanaLabs, Ltd. > + * Copyright 2016-2021 HabanaLabs, Ltd. > * All Rights Reserved. > */ > > @@ -11,11 +11,13 @@ > > #include <linux/uaccess.h> > #include <linux/slab.h> > +#include <linux/pci-p2pdma.h> > > #define HL_MMU_DEBUG 0 > > /* use small pages for supporting non-pow2 (32M/40M/48M) DRAM phys page sizes */ > -#define DRAM_POOL_PAGE_SIZE SZ_8M > +#define DRAM_POOL_PAGE_SIZE SZ_8M > + ?? > /* > * The va ranges in context object contain a list with the available chunks of > @@ -347,6 +349,13 @@ static int free_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args) > return -EINVAL; > } > > + if (phys_pg_pack->exporting_cnt) { > + dev_err(hdev->dev, > + "handle %u is exported, cannot free\n", handle); > + spin_unlock(&vm->idr_lock); Don't write to the kernel log from user space triggered actions > +static int alloc_sgt_from_device_pages(struct hl_device *hdev, > + struct sg_table **sgt, u64 *pages, > + u64 npages, u64 page_size, > + struct device *dev, > + enum dma_data_direction dir) Why doesn't this return a sg_table * and an ERR_PTR? > +{ > + u64 chunk_size, bar_address, dma_max_seg_size; > + struct asic_fixed_properties *prop; > + int rc, i, j, nents, cur_page; > + struct scatterlist *sg; > + > + prop = &hdev->asic_prop; > + > + dma_max_seg_size = dma_get_max_seg_size(dev); > + > + /* We would like to align the max segment size to PAGE_SIZE, so the > + * SGL will contain aligned addresses that can be easily mapped to > + * an MMU > + */ > + dma_max_seg_size = ALIGN_DOWN(dma_max_seg_size, PAGE_SIZE); > + if (dma_max_seg_size < PAGE_SIZE) { > + dev_err_ratelimited(hdev->dev, > + "dma_max_seg_size %llu can't be smaller than PAGE_SIZE\n", > + dma_max_seg_size); > + return -EINVAL; > + } > + > + *sgt = kzalloc(sizeof(**sgt), GFP_KERNEL); > + if (!*sgt) > + return -ENOMEM; > + > + /* If the size of each page is larger than the dma max segment size, > + * then we can't combine pages and the number of entries in the SGL > + * will just be the > + * <number of pages> * <chunks of max segment size in each page> > + */ > + if (page_size > dma_max_seg_size) > + nents = npages * DIV_ROUND_UP_ULL(page_size, dma_max_seg_size); > + else > + /* Get number of non-contiguous chunks */ > + for (i = 1, nents = 1, chunk_size = page_size ; i < npages ; i++) { > + if (pages[i - 1] + page_size != pages[i] || > + chunk_size + page_size > dma_max_seg_size) { > + nents++; > + chunk_size = page_size; > + continue; > + } > + > + chunk_size += page_size; > + } > + > + rc = sg_alloc_table(*sgt, nents, GFP_KERNEL | __GFP_ZERO); > + if (rc) > + goto error_free; > + > + /* Because we are not going to include a CPU list we want to have some > + * chance that other users will detect this by setting the orig_nents > + * to 0 and using only nents (length of DMA list) when going over the > + * sgl > + */ > + (*sgt)->orig_nents = 0; Maybe do this at the end so you'd have to undo it on the error path? > + cur_page = 0; > + > + if (page_size > dma_max_seg_size) { > + u64 size_left, cur_device_address = 0; > + > + size_left = page_size; > + > + /* Need to split each page into the number of chunks of > + * dma_max_seg_size > + */ > + for_each_sgtable_dma_sg((*sgt), sg, i) { > + if (size_left == page_size) > + cur_device_address = > + pages[cur_page] - prop->dram_base_address; > + else > + cur_device_address += dma_max_seg_size; > + > + chunk_size = min(size_left, dma_max_seg_size); > + > + bar_address = hdev->dram_pci_bar_start + cur_device_address; > + > + rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir); > + if (rc) > + goto error_unmap; > + > + if (size_left > dma_max_seg_size) { > + size_left -= dma_max_seg_size; > + } else { > + cur_page++; > + size_left = page_size; > + } > + } > + } else { > + /* Merge pages and put them into the scatterlist */ > + for_each_sgtable_dma_sg((*sgt), sg, i) { > + chunk_size = page_size; > + for (j = cur_page + 1 ; j < npages ; j++) { > + if (pages[j - 1] + page_size != pages[j] || > + chunk_size + page_size > dma_max_seg_size) > + break; > + > + chunk_size += page_size; > + } > + > + bar_address = hdev->dram_pci_bar_start + > + (pages[cur_page] - prop->dram_base_address); > + > + rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir); > + if (rc) > + goto error_unmap; > + > + cur_page = j; > + } > + } We have this sg_append stuff now that is intended to help building these things. It can only build CPU page lists, not these DMA lists, but I do wonder if open coding in drivers is slipping back a bit. Especially since AMD seems to be doing something different. Could the DMABUF layer gain some helpers styled after the sg_append to simplify building these things? and convert the AMD driver of course. > +static int hl_dmabuf_attach(struct dma_buf *dmabuf, > + struct dma_buf_attachment *attachment) > +{ > + struct hl_dmabuf_wrapper *hl_dmabuf; > + struct hl_device *hdev; > + int rc; > + > + hl_dmabuf = dmabuf->priv; > + hdev = hl_dmabuf->ctx->hdev; > + > + rc = pci_p2pdma_distance_many(hdev->pdev, &attachment->dev, 1, true); > + > + if (rc < 0) > + attachment->peer2peer = false; Extra blank line > + > + return 0; > +} > + > +static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment, > + enum dma_data_direction dir) > +{ > + struct dma_buf *dma_buf = attachment->dmabuf; > + struct hl_vm_phys_pg_pack *phys_pg_pack; > + struct hl_dmabuf_wrapper *hl_dmabuf; > + struct hl_device *hdev; > + struct sg_table *sgt; > + int rc; > + > + hl_dmabuf = dma_buf->priv; > + hdev = hl_dmabuf->ctx->hdev; > + phys_pg_pack = hl_dmabuf->phys_pg_pack; > + > + if (!attachment->peer2peer) { > + dev_err(hdev->dev, > + "Failed to map dmabuf because p2p is disabled\n"); > + return ERR_PTR(-EPERM); User triggered printable again? > +static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment, > + struct sg_table *sgt, > + enum dma_data_direction dir) > +{ > + struct scatterlist *sg; > + int i; > + > + for_each_sgtable_dma_sg(sgt, sg, i) > + dma_unmap_resource(attachment->dev, sg_dma_address(sg), > + sg_dma_len(sg), dir, > + DMA_ATTR_SKIP_CPU_SYNC); Why can we skip the CPU_SYNC? Seems like a comment is needed Something has to do a CPU_SYNC before recylcing this memory for another purpose, where is it? > +static void hl_release_dmabuf(struct dma_buf *dmabuf) > +{ > + struct hl_dmabuf_wrapper *hl_dmabuf = dmabuf->priv; Maybe hl_dmabuf_wrapper should be hl_dmabuf_priv > + * export_dmabuf_from_addr() - export a dma-buf object for the given memory > + * address and size. > + * @ctx: pointer to the context structure. > + * @device_addr: device memory physical address. > + * @size: size of device memory. > + * @flags: DMA-BUF file/FD flags. > + * @dmabuf_fd: pointer to result FD that represents the dma-buf object. > + * > + * Create and export a dma-buf object for an existing memory allocation inside > + * the device memory, and return a FD which is associated with the dma-buf > + * object. > + * > + * Return: 0 on success, non-zero for failure. > + */ > +static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr, > + u64 size, int flags, int *dmabuf_fd) > +{ > + struct hl_dmabuf_wrapper *hl_dmabuf; > + struct hl_device *hdev = ctx->hdev; > + struct asic_fixed_properties *prop; > + u64 bar_address; > + int rc; > + > + prop = &hdev->asic_prop; > + > + if (!IS_ALIGNED(device_addr, PAGE_SIZE)) { > + dev_err_ratelimited(hdev->dev, > + "address of exported device memory should be aligned to 0x%lx, address 0x%llx\n", > + PAGE_SIZE, device_addr); > + return -EINVAL; > + } > + > + if (size < PAGE_SIZE) { > + dev_err_ratelimited(hdev->dev, > + "size %llu of exported device memory should be equal to or greater than %lu\n", > + size, PAGE_SIZE); > + return -EINVAL; > + } > + > + if (device_addr < prop->dram_user_base_address || > + device_addr + size > prop->dram_end_address || > + device_addr + size < device_addr) { > + dev_err_ratelimited(hdev->dev, > + "DRAM memory range is outside of DRAM boundaries, address 0x%llx, size 0x%llx\n", > + device_addr, size); > + return -EINVAL; > + } > + > + bar_address = hdev->dram_pci_bar_start + > + (device_addr - prop->dram_base_address); > + > + if (bar_address + size > > + hdev->dram_pci_bar_start + prop->dram_pci_bar_size || > + bar_address + size < bar_address) { > + dev_err_ratelimited(hdev->dev, > + "DRAM memory range is outside of PCI BAR boundaries, address 0x%llx, size 0x%llx\n", > + device_addr, size); > + return -EINVAL; > + } More prints from userspace > +static int export_dmabuf_from_handle(struct hl_ctx *ctx, u64 handle, int flags, > + int *dmabuf_fd) > +{ > + struct hl_vm_phys_pg_pack *phys_pg_pack; > + struct hl_dmabuf_wrapper *hl_dmabuf; > + struct hl_device *hdev = ctx->hdev; > + struct asic_fixed_properties *prop; > + struct hl_vm *vm = &hdev->vm; > + u64 bar_address; > + u32 idr_handle; > + int rc, i; > + > + prop = &hdev->asic_prop; > + > + idr_handle = lower_32_bits(handle); Why silent truncation? Shouldn't setting the upper 32 bits be an error? > + case HL_MEM_OP_EXPORT_DMABUF_FD: > + rc = export_dmabuf_from_addr(ctx, > + args->in.export_dmabuf_fd.handle, > + args->in.export_dmabuf_fd.mem_size, > + args->in.flags, > + &dmabuf_fd); > + memset(args, 0, sizeof(*args)); > + args->out.fd = dmabuf_fd; Would expect the installed fd to be the positive return, not a pointer Jason

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v6 1/2] habanalabs: define uAPI to export FD for DMA-BUF

by Jason Gunthorpe

On Sun, Sep 12, 2021 at 07:53:08PM +0300, Oded Gabbay wrote: > /* HL_MEM_OP_* */ > __u32 op; > - /* HL_MEM_* flags */ > + /* HL_MEM_* flags. > + * For the HL_MEM_OP_EXPORT_DMABUF_FD opcode, this field holds the > + * DMA-BUF file/FD flags. > + */ > __u32 flags; > /* Context ID - Currently not in use */ > __u32 ctx_id; > @@ -1072,6 +1091,13 @@ struct hl_mem_out { > > __u32 pad; > }; > + > + /* Returned in HL_MEM_OP_EXPORT_DMABUF_FD. Represents the > + * DMA-BUF object that was created to describe a memory > + * allocation on the device's memory space. The FD should be > + * passed to the importer driver > + */ > + __u64 fd; fd's should be a s32 type in a fixed width uapi. I usually expect to see the uapi changes inside the commit that consumes them, splitting the patch like this seems strange but harmless. Jason

4 years, 4 months

1
0
0 0

[PATCH v3 0/9] dma-fence: Deadline awareness

by Rob Clark

From: Rob Clark <robdclark(a)chromium.org> This series adds deadline awareness to fences, so realtime deadlines such as vblank can be communicated to the fence signaller for power/ frequency management decisions. This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons: 1) To continue to be able to use the atomic helpers 2) To support cases where display and gpu are different drivers This iteration adds a dma-fence ioctl to set a deadline (both to support igt-tests, and compositors which delay decisions about which client buffer to display), and a sw_sync ioctl to read back the deadline. IGT tests utilizing these can be found at: https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadl… v1: https://patchwork.freedesktop.org/series/93035/ v2: Move filtering out of later deadlines to fence implementation to avoid increasing the size of dma_fence v3: Add support in fence-array and fence-chain; Add some uabi to support igt tests and userspace compositors. Rob Clark (9): dma-fence: Add deadline awareness drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/scheduler: Add fence deadline support drm/msm: Add deadline based boost support dma-buf/fence-array: Add fence deadline support dma-buf/fence-chain: Add fence deadline support dma-buf/sync_file: Add SET_DEADLINE ioctl dma-buf/sw_sync: Add fence deadline support drivers/dma-buf/dma-fence-array.c | 11 ++++ drivers/dma-buf/dma-fence-chain.c | 13 +++++ drivers/dma-buf/dma-fence.c | 20 +++++++ drivers/dma-buf/sw_sync.c | 58 +++++++++++++++++++ drivers/dma-buf/sync_debug.h | 2 + drivers/dma-buf/sync_file.c | 19 +++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 ++++++++++++ drivers/gpu/drm/drm_vblank.c | 32 +++++++++++ drivers/gpu/drm/msm/msm_fence.c | 76 +++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++ drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++++++ drivers/gpu/drm/scheduler/sched_fence.c | 34 +++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/drm_vblank.h | 1 + include/drm/gpu_scheduler.h | 8 +++ include/linux/dma-fence.h | 16 ++++++ include/uapi/linux/sync_file.h | 20 +++++++ 18 files changed, 388 insertions(+), 1 deletion(-) -- 2.31.1

4 years, 4 months

5
34
0 0

[PATCH 1/4] dma-buf: add dma_fence_describe and dma_resv_describe

by Christian König

Add functions to dump dma_fence and dma_resv objects into a seq_file and use them for printing the debugfs informations. Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/dma-buf.c | 11 +---------- drivers/dma-buf/dma-fence.c | 16 ++++++++++++++++ drivers/dma-buf/dma-resv.c | 23 +++++++++++++++++++++++ include/linux/dma-fence.h | 1 + include/linux/dma-resv.h | 1 + 5 files changed, 42 insertions(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index d35c71743ccb..4975c9289b02 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -1368,8 +1368,6 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused) { struct dma_buf *buf_obj; struct dma_buf_attachment *attach_obj; - struct dma_resv_iter cursor; - struct dma_fence *fence; int count = 0, attach_count; size_t size = 0; int ret; @@ -1397,14 +1395,7 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused) file_inode(buf_obj->file)->i_ino, buf_obj->name ?: ""); - dma_resv_for_each_fence(&cursor, buf_obj->resv, true, fence) { - seq_printf(s, "\t%s fence: %s %s %ssignalled\n", - dma_resv_iter_is_exclusive(&cursor) ? - "Exclusive" : "Shared", - fence->ops->get_driver_name(fence), - fence->ops->get_timeline_name(fence), - dma_fence_is_signaled(fence) ? "" : "un"); - } + dma_resv_describe(buf_obj->resv, s); seq_puts(s, "\tAttached Devices:\n"); attach_count = 0; diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 1e82ecd443fa..5175adf58644 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -907,6 +907,22 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout); +/** + * dma_fence_describe - Dump fence describtion into seq_file + * @fence: the 6fence to describe + * @seq: the seq_file to put the textual description into + * + * Dump a textual description of the fence and it's state into the seq_file. + */ +void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) +{ + seq_printf(seq, "%s %s seq %llu %ssignalled\n", + fence->ops->get_driver_name(fence), + fence->ops->get_timeline_name(fence), fence->seqno, + dma_fence_is_signaled(fence) ? "" : "un"); +} +EXPORT_SYMBOL(dma_fence_describe); + /** * dma_fence_init - Initialize a custom fence. * @fence: the fence to initialize diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 266ec9e3caef..6bb25d53e702 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -38,6 +38,7 @@ #include <linux/mm.h> #include <linux/sched/mm.h> #include <linux/mmu_notifier.h> +#include <linux/seq_file.h> /** * DOC: Reservation Object Overview @@ -654,6 +655,28 @@ bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) } EXPORT_SYMBOL_GPL(dma_resv_test_signaled); +/** + * dma_resv_describe - Dump description of the resv object into seq_file + * @obj: the reservation object + * @seq: the seq_file to dump the description into + * + * Dump a textual description of the fences inside an dma_resv object into the + * seq_file. + */ +void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq) +{ + struct dma_resv_iter cursor; + struct dma_fence *fence; + + dma_resv_for_each_fence(&cursor, obj, true, fence) { + seq_printf(seq, "\t%s fence:", + dma_resv_iter_is_exclusive(&cursor) ? + "Exclusive" : "Shared"); + dma_fence_describe(fence, seq); + } +} +EXPORT_SYMBOL_GPL(dma_resv_describe); + #if IS_ENABLED(CONFIG_LOCKDEP) static int __init dma_resv_lockdep(void) { diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index a706b7bf51d7..1ea691753bd3 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -264,6 +264,7 @@ void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, void dma_fence_release(struct kref *kref); void dma_fence_free(struct dma_fence *fence); +void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq); /** * dma_fence_put - decreases refcount of the fence diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index d4b4cd43f0f1..49c0152073fd 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -486,5 +486,6 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout); bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all); +void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq); #endif /* _LINUX_RESERVATION_H */ -- 2.25.1

4 years, 4 months

2
5
0 0

Re: [Linaro-mm-sig] [PATCH net-next 3/5] net: bcmasp: Add support for ASP2.0 Ethernet controller

by Andrew Lunn

> > > +static int bcmasp_set_priv_flags(struct net_device *dev, u32 flags) > > > +{ > > > + struct bcmasp_intf *intf = netdev_priv(dev); > > > + > > > + intf->wol_keep_rx_en = flags & BCMASP_WOL_KEEP_RX_EN ? 1 : 0; > > > + > > > + return 0; > > > > Please could you explain this some more. How can you disable RX and > > still have WoL working? > > Wake-on-LAN using Magic Packets and network filters requires keeping the > UniMAC's receiver turned on, and then the packets feed into the Magic Packet > Detector (MPD) block or the network filter block. In that mode DRAM is in > self refresh and there is local matching of frames into a tiny FIFO however > in the case of magic packets the packets leading to a wake-up are dropped as > there is nowhere to store them. In the case of a network filter match (e.g.: > matching a multicast IP address plus protocol, plus source/destination > ports) the packets are also discarded because the receive DMA was shut down. > > When the wol_keep_rx_en flag is set, the above happens but we also allow the > packets that did match a network filter to reach the small FIFO (Justin > would know how many entries are there) that is used to push the packets to > DRAM. The packet contents are held in there until the system wakes up which > is usually just a few hundreds of micro seconds after we received a packet > that triggered a wake-up. Once we overflow the receive DMA FIFO capacity > subsequent packets get dropped which is fine since we are usually talking > about very low bit rates, and we only try to push to DRAM the packets of > interest, that is those for which we have a network filter. > > This is convenient in scenarios where you want to wake-up from multicast DNS > (e.g.: wake on Googlecast, Bonjour etc.) because then the packet that > resulted in the system wake-up is not discarded but is then delivered to the > network stack. Thanks for the explanation. It would be easier for the user if you automate this. Enable is by default for WoL types which have user content? > > > + /* Per ch */ > > > + intf->tx_spb_dma = priv->base + TX_SPB_DMA_OFFSET(intf); > > > + intf->res.tx_spb_ctrl = priv->base + TX_SPB_CTRL_OFFSET(intf); > > > + /* > > > + * Stop gap solution. This should be removed when 72165a0 is > > > + * deprecated > > > + */ > > > > Is that an internal commit? > > Yes this is a revision of the silicon that is not meant to see the light of > day. So this can all be removed? Andrew

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH net-next 1/5] dt-bindings: net: Brcm ASP 2.0 Ethernet controller

by Rob Herring

On Fri, 24 Sep 2021 14:44:47 -0700, Justin Chen wrote: > From: Florian Fainelli <f.fainelli(a)gmail.com> > > Add a binding document for the Broadcom ASP 2.0 Ethernet controller. > > Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com> > Signed-off-by: Justin Chen <justinpopo6(a)gmail.com> > --- > .../devicetree/bindings/net/brcm,asp-v2.0.yaml | 147 +++++++++++++++++++++ > 1 file changed, 147 insertions(+) > create mode 100644 Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' on your patch (DT_CHECKER_FLAGS is new in v5.13): yamllint warnings/errors: ./Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml:79:10: [warning] wrong indentation: expected 10 but found 9 (indentation) dtschema/dtc warnings/errors: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/net/brcm,asp-v2.0.example.dt.yaml: asp@9c00000: 'mdio@c614', 'mdio@ce14' do not match any of the regexes: 'pinctrl-[0-9]+' From schema: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml Documentation/devicetree/bindings/net/brcm,asp-v2.0.example.dt.yaml:0:0: /example-0/asp@9c00000/mdio@c614: failed to match any schema with compatible: ['brcm,asp-v2.0-mdio'] Documentation/devicetree/bindings/net/brcm,asp-v2.0.example.dt.yaml:0:0: /example-0/asp@9c00000/mdio@ce14: failed to match any schema with compatible: ['brcm,asp-v2.0-mdio'] doc reference errors (make refcheckdocs): See https://patchwork.ozlabs.org/patch/1532528 This check can fail if there are any dependencies. The base for a patch series is generally the most recent rc1. If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure 'yamllint' is installed and dt-schema is up to date: pip3 install dtschema --upgrade Please check and re-submit.

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH net-next 2/5] dt-bindings: net: brcm, unimac-mdio: Add asp-v2.0

by Rob Herring

On Fri, 24 Sep 2021 14:44:48 -0700, Justin Chen wrote: > The ASP 2.0 Ethernet controller uses a brcm unimac. > > Signed-off-by: Justin Chen <justinpopo6(a)gmail.com> > Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com> > --- > Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml | 1 + > 1 file changed, 1 insertion(+) > Running 'make dtbs_check' with the schema in this patch gives the following warnings. Consider if they are expected or the schema is incorrect. These may not be new warnings. Note that it is not yet a requirement to have 0 warnings for dtbs_check. This will change in the future. Full log is available here: https://patchwork.ozlabs.org/patch/1532529 mdio@e14: #address-cells:0:0: 1 was expected arch/arm64/boot/dts/broadcom/bcm2711-rpi-400.dt.yaml arch/arm64/boot/dts/broadcom/bcm2711-rpi-4-b.dt.yaml arch/arm/boot/dts/bcm2711-rpi-400.dt.yaml arch/arm/boot/dts/bcm2711-rpi-4-b.dt.yaml mdio@e14: #size-cells:0:0: 0 was expected arch/arm64/boot/dts/broadcom/bcm2711-rpi-400.dt.yaml arch/arm64/boot/dts/broadcom/bcm2711-rpi-4-b.dt.yaml arch/arm/boot/dts/bcm2711-rpi-400.dt.yaml arch/arm/boot/dts/bcm2711-rpi-4-b.dt.yaml

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH net-next 3/5] net: bcmasp: Add support for ASP2.0 Ethernet controller

by Andrew Lunn

> +static int bcmasp_probe(struct platform_device *pdev) > +{ > + struct bcmasp_priv *priv; > + struct device_node *ports_node, *intf_node; > + struct device *dev = &pdev->dev; > + int ret, i, wol_irq, count = 0; > + struct bcmasp_intf *intf; > + struct resource *r; > + u32 u32_reserved_filters_bitmask; > + DECLARE_BITMAP(reserved_filters_bitmask, ASP_RX_NET_FILTER_MAX); > + > + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); > + if (!priv) > + return -ENOMEM; > + > + priv->irq = platform_get_irq(pdev, 0); > + if (priv->irq <= 0) { > + dev_err(dev, "invalid interrupt\n"); > + return -EINVAL; > + } > + > + priv->clk = devm_clk_get(dev, "sw_asp"); > + if (IS_ERR(priv->clk)) { > + if (PTR_ERR(priv->clk) == -EPROBE_DEFER) > + return -EPROBE_DEFER; > + dev_warn(dev, "failed to request clock\n"); > + priv->clk = NULL; > + } devm_clk_get_optional() makes this simpler/ > + > + /* Base from parent node */ > + r = platform_get_resource(pdev, IORESOURCE_MEM, 0); > + priv->base = devm_ioremap_resource(&pdev->dev, r); > + if (IS_ERR(priv->base)) { > + dev_err(dev, "failed to iomap\n"); > + return PTR_ERR(priv->base); > + } > + > + ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40)); > + if (ret) > + ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)); > + if (ret) { > + dev_err(&pdev->dev, "unable to set DMA mask: %d\n", ret); > + return ret; > + } > + > + dev_set_drvdata(&pdev->dev, priv); > + priv->pdev = pdev; > + spin_lock_init(&priv->mda_lock); > + spin_lock_init(&priv->clk_lock); > + mutex_init(&priv->net_lock); > + > + ret = clk_prepare_enable(priv->clk); > + if (ret) > + return ret; > + > + /* Enable all clocks to ensure successful probing */ > + bcmasp_core_clock_set(priv, ASP_CTRL_CLOCK_CTRL_ASP_ALL_DISABLE, 0); > + > + /* Switch to the main clock */ > + bcmasp_core_clock_select(priv, false); > + > + _intr2_mask_set(priv, 0xffffffff); > + intr2_core_wl(priv, 0xffffffff, ASP_INTR2_CLEAR); > + > + ret = devm_request_irq(&pdev->dev, priv->irq, bcmasp_isr, 0, > + pdev->name, priv); > + if (ret) { > + dev_err(dev, "failed to request ASP interrupt: %d\n", ret); > + return ret; > + } Do you need to undo clk_prepare_enable()? > + > + /* Register mdio child nodes */ > + of_platform_populate(dev->of_node, bcmasp_mdio_of_match, NULL, > + dev); > + > + ret = of_property_read_u32(dev->of_node, > + "brcm,reserved-net-filters-mask", > + &u32_reserved_filters_bitmask); > + if (ret) > + u32_reserved_filters_bitmask = 0; > + > + priv->net_filters_count_max = ASP_RX_NET_FILTER_MAX; > + bitmap_zero(reserved_filters_bitmask, priv->net_filters_count_max); > + bitmap_from_arr32(reserved_filters_bitmask, > + &u32_reserved_filters_bitmask, > + priv->net_filters_count_max); > + > + /* Discover bitmask of reserved filters */ > + for_each_set_bit(i, reserved_filters_bitmask, ASP_RX_NET_FILTER_MAX) { > + priv->net_filters[i].reserved = true; > + priv->net_filters_count_max--; > + } > + > + /* > + * ASP specific initialization, Needs to be done irregardless of > + * of how many interfaces come up. > + */ > + bcmasp_core_init(priv); > + bcmasp_core_init_filters(priv); > + > + ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports"); > + if (!ports_node) { > + dev_warn(dev, "No ports found\n"); > + return 0; > + } > + > + priv->intf_count = of_get_available_child_count(ports_node); > + > + priv->intfs = devm_kcalloc(dev, priv->intf_count, > + sizeof(struct bcmasp_intf *), > + GFP_KERNEL); > + if (!priv->intfs) > + return -ENOMEM; > + > + /* Probe each interface (Initalization should continue even if > + * interfaces are unable to come up) > + */ > + i = 0; > + for_each_available_child_of_node(ports_node, intf_node) { > + wol_irq = platform_get_irq_optional(pdev, i + 1); > + priv->intfs[i++] = bcmasp_interface_create(priv, intf_node, > + wol_irq); > + } > + > + /* Drop the clock reference count now and let ndo_open()/ndo_close() > + * manage it for us from now on. > + */ > + bcmasp_core_clock_set(priv, 0, ASP_CTRL_CLOCK_CTRL_ASP_ALL_DISABLE); > + > + clk_disable_unprepare(priv->clk); > + > + /* Now do the registration of the network ports which will take care of > + * managing the clock properly. > + */ > + for (i = 0; i < priv->intf_count; i++) { > + intf = priv->intfs[i]; > + if (!intf) > + continue; > + > + ret = register_netdev(intf->ndev); > + if (ret) { > + netdev_err(intf->ndev, > + "failed to register net_device: %d\n", ret); > + bcmasp_interface_destroy(intf, false); > + continue; > + } > + count++; > + } > + > + dev_info(dev, "Initialized %d port(s)\n", count); > + > + return 0; > +} > + > +static int bcmasp_remove(struct platform_device *pdev) > +{ > + struct bcmasp_priv *priv = dev_get_drvdata(&pdev->dev); > + struct bcmasp_intf *intf; > + int i; > + > + for (i = 0; i < priv->intf_count; i++) { > + intf = priv->intfs[i]; > + if (!intf) > + continue; > + > + bcmasp_interface_destroy(intf, true); > + } > + > + return 0; > +} Do you need to depopulate the mdio children? > +static void bcmasp_get_drvinfo(struct net_device *dev, > + struct ethtool_drvinfo *info) > +{ > + strlcpy(info->driver, "bcmasp", sizeof(info->driver)); > + strlcpy(info->version, "v2.0", sizeof(info->version)); Please drop version. The core will fill it in with the kernel version, which is more useful. > +static int bcmasp_nway_reset(struct net_device *dev) > +{ > + if (!dev->phydev) > + return -ENODEV; > + > + return genphy_restart_aneg(dev->phydev); > +} phy_ethtool_nway_reset(). > +static void bcmasp_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) > +{ > + struct bcmasp_intf *intf = netdev_priv(dev); > + > + wol->supported = WAKE_MAGIC | WAKE_MAGICSECURE | WAKE_FILTER; > + wol->wolopts = intf->wolopts; > + memset(wol->sopass, 0, sizeof(wol->sopass)); > + > + if (wol->wolopts & WAKE_MAGICSECURE) > + memcpy(wol->sopass, intf->sopass, sizeof(intf->sopass)); > +} Maybe consider calling into the PHY to see what it can do? If the PHY can do the WoL you want, it will do it with less power. > +static int bcmasp_set_priv_flags(struct net_device *dev, u32 flags) > +{ > + struct bcmasp_intf *intf = netdev_priv(dev); > + > + intf->wol_keep_rx_en = flags & BCMASP_WOL_KEEP_RX_EN ? 1 : 0; > + > + return 0; Please could you explain this some more. How can you disable RX and still have WoL working? > +static void bcmasp_adj_link(struct net_device *dev) > +{ > + struct bcmasp_intf *intf = netdev_priv(dev); > + struct phy_device *phydev = dev->phydev; > + int changed = 0; > + u32 cmd_bits = 0, reg; > + > + if (intf->old_link != phydev->link) { > + changed = 1; > + intf->old_link = phydev->link; > + } > + > + if (intf->old_duplex != phydev->duplex) { > + changed = 1; > + intf->old_duplex = phydev->duplex; > + } > + > + switch (phydev->speed) { > + case SPEED_2500: > + cmd_bits = UMC_CMD_SPEED_2500; All i've seen is references to RGMII. Is 2500 possible? > + break; > + case SPEED_1000: > + cmd_bits = UMC_CMD_SPEED_1000; > + break; > + case SPEED_100: > + cmd_bits = UMC_CMD_SPEED_100; > + break; > + case SPEED_10: > + cmd_bits = UMC_CMD_SPEED_10; > + break; > + default: > + break; > + } > + cmd_bits <<= UMC_CMD_SPEED_SHIFT; > + > + if (phydev->duplex == DUPLEX_HALF) > + cmd_bits |= UMC_CMD_HD_EN; > + > + if (intf->old_pause != phydev->pause) { > + changed = 1; > + intf->old_pause = phydev->pause; > + } > + > + if (!phydev->pause) > + cmd_bits |= UMC_CMD_RX_PAUSE_IGNORE | UMC_CMD_TX_PAUSE_IGNORE; > + > + if (!changed) > + return; Shouldn't there be a comparison intd->old_speed != phydev->speed? You are risking the PHY can change speed without doing a link down/up? > + > + if (phydev->link) { > + reg = umac_rl(intf, UMC_CMD); > + reg &= ~((UMC_CMD_SPEED_MASK << UMC_CMD_SPEED_SHIFT) | > + UMC_CMD_HD_EN | UMC_CMD_RX_PAUSE_IGNORE | > + UMC_CMD_TX_PAUSE_IGNORE); > + reg |= cmd_bits; > + umac_wl(intf, reg, UMC_CMD); > + > + /* Enable RGMII pad */ > + reg = rgmii_rl(intf, RGMII_OOB_CNTRL); > + reg |= RGMII_MODE_EN; > + rgmii_wl(intf, reg, RGMII_OOB_CNTRL); > + > + intf->eee.eee_active = phy_init_eee(phydev, 0) >= 0; > + bcmasp_eee_enable_set(intf, intf->eee.eee_active); > + } else { > + /* Disable RGMII pad */ > + reg = rgmii_rl(intf, RGMII_OOB_CNTRL); > + reg &= ~RGMII_MODE_EN; > + rgmii_wl(intf, reg, RGMII_OOB_CNTRL); > + } > + > + if (changed) > + phy_print_status(phydev); There has already been a return if !changed. > +static void bcmasp_configure_port(struct bcmasp_intf *intf) > +{ > + u32 reg, id_mode_dis = 0; > + > + reg = rgmii_rl(intf, RGMII_PORT_CNTRL); > + reg &= ~RGMII_PORT_MODE_MASK; > + > + switch (intf->phy_interface) { > + case PHY_INTERFACE_MODE_RGMII: > + /* RGMII_NO_ID: TXC transitions at the same time as TXD > + * (requires PCB or receiver-side delay) > + * RGMII: Add 2ns delay on TXC (90 degree shift) > + * > + * ID is implicitly disabled for 100Mbps (RG)MII operation. > + */ > + id_mode_dis = RGMII_ID_MODE_DIS; > + fallthrough; > + case PHY_INTERFACE_MODE_RGMII_TXID: > + reg |= RGMII_PORT_MODE_EXT_GPHY; > + break; > + case PHY_INTERFACE_MODE_MII: > + reg |= RGMII_PORT_MODE_EXT_EPHY; > + break; > + default: > + break; > + } Can we skip this and let the PHY do the delays? Ah, "This is an ugly quirk..." Maybe add a comment here pointing towards bcmasp_netif_init(), which is explains this. > +static int bcmasp_netif_init(struct net_device *dev, bool phy_connect, > + bool init_rx) > +{ > + struct bcmasp_intf *intf = netdev_priv(dev); > + phy_interface_t phy_iface = intf->phy_interface; > + u32 phy_flags = PHY_BRCM_AUTO_PWRDWN_ENABLE | > + PHY_BRCM_DIS_TXCRXC_NOENRGY | > + PHY_BRCM_IDDQ_SUSPEND; > + struct phy_device *phydev = NULL; > + int ret; > + > + /* Always enable interface clocks */ > + bcmasp_core_clock_set_intf(intf, true); > + > + /* Enable internal PHY before any MAC activity */ > + if (intf->internal_phy) > + bcmasp_ephy_enable_set(intf, true); > + > + bcmasp_configure_port(intf); > + > + /* This is an ugly quirk but we have not been correctly interpreting > + * the phy_interface values and we have done that across different > + * drivers, so at least we are consistent in our mistakes. > + * > + * When the Generic PHY driver is in use either the PHY has been > + * strapped or programmed correctly by the boot loader so we should > + * stick to our incorrect interpretation since we have validated it. > + * > + * Now when a dedicated PHY driver is in use, we need to reverse the > + * meaning of the phy_interface_mode values to something that the PHY > + * driver will interpret and act on such that we have two mistakes > + * canceling themselves so to speak. We only do this for the two > + * modes that GENET driver officially supports on Broadcom STB chips: > + * PHY_INTERFACE_MODE_RGMII and PHY_INTERFACE_MODE_RGMII_TXID. Other > + * modes are not *officially* supported with the boot loader and the > + * scripted environment generating Device Tree blobs for those > + * platforms. > + * > + * Note that internal PHY and fixed-link configurations are not > + * affected because they use different phy_interface_t values or the > + * Generic PHY driver. > + */ > +static inline void bcmasp_map_res(struct bcmasp_priv *priv, > + struct bcmasp_intf *intf) > +{ > + /* Per port */ > + intf->res.umac = priv->base + UMC_OFFSET(intf); > + intf->res.umac2fb = priv->base + UMAC2FB_OFFSET(intf); > + intf->res.rgmii = priv->base + RGMII_OFFSET(intf); > + > + /* Per ch */ > + intf->tx_spb_dma = priv->base + TX_SPB_DMA_OFFSET(intf); > + intf->res.tx_spb_ctrl = priv->base + TX_SPB_CTRL_OFFSET(intf); > + /* > + * Stop gap solution. This should be removed when 72165a0 is > + * deprecated > + */ Is that an internal commit? Andrew

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH net-next 0/5] brcm ASP 2.0 Ethernet controller

by Andrew Lunn

On Fri, Sep 24, 2021 at 02:44:46PM -0700, Justin Chen wrote: > This patch set adds support for Broadcom's ASP 2.0 Ethernet controller. Hi Justin Does the hardware support L2 switching between the two ports? I'm just wondering if later this is going to be modified into a switchdev driver? Andrew

4 years, 4 months

1
0
0 0

[PATCH 01/27] dma-buf: add dma_resv_for_each_fence_unlocked v6

by Christian König

Abstract the complexity of iterating over all the fences in a dma_resv object. The new loop handles the whole RCU and retry dance and returns only fences where we can be sure we grabbed the right one. v2: fix accessing the shared fences while they might be freed, improve kerneldoc, rename _cursor to _iter, add dma_resv_iter_is_exclusive, add dma_resv_iter_begin/end v3: restructor the code, move rcu_read_lock()/unlock() into the iterator, add dma_resv_iter_is_restarted() v4: fix NULL deref when no explicit fence exists, drop superflous rcu_read_lock()/unlock() calls. v5: fix typos in the documentation v6: fix coding error when excl fence is NULL Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/dma-resv.c | 98 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-resv.h | 95 ++++++++++++++++++++++++++++++++++++ 2 files changed, 193 insertions(+) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 84fbe60629e3..97af397304f3 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -323,6 +323,104 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) } EXPORT_SYMBOL(dma_resv_add_excl_fence); +/** + * dma_resv_iter_restart_unlocked - restart the unlocked iterator + * @cursor: The dma_resv_iter object to restart + * + * Restart the unlocked iteration by initializing the cursor object. + */ +static void dma_resv_iter_restart_unlocked(struct dma_resv_iter *cursor) +{ + cursor->seq = read_seqcount_begin(&cursor->obj->seq); + cursor->index = -1; + if (cursor->all_fences) + cursor->fences = dma_resv_shared_list(cursor->obj); + else + cursor->fences = NULL; + cursor->is_restarted = true; +} + +/** + * dma_resv_iter_walk_unlocked - walk over fences in a dma_resv obj + * @cursor: cursor to record the current position + * + * Return all the fences in the dma_resv object which are not yet signaled. + * The returned fence has an extra local reference so will stay alive. + * If a concurrent modify is detected the whole iteration is started over again. + */ +static void dma_resv_iter_walk_unlocked(struct dma_resv_iter *cursor) +{ + struct dma_resv *obj = cursor->obj; + + do { + /* Drop the reference from the previous round */ + dma_fence_put(cursor->fence); + + if (cursor->index == -1) { + cursor->fence = dma_resv_excl_fence(obj); + cursor->index++; + if (!cursor->fence) + continue; + + } else if (!cursor->fences || + cursor->index >= cursor->fences->shared_count) { + cursor->fence = NULL; + break; + + } else { + struct dma_resv_list *fences = cursor->fences; + unsigned int idx = cursor->index++; + + cursor->fence = rcu_dereference(fences->shared[idx]); + } + cursor->fence = dma_fence_get_rcu(cursor->fence); + } while (cursor->fence && dma_fence_is_signaled(cursor->fence)); +} + +/** + * dma_resv_iter_first_unlocked - first fence in an unlocked dma_resv obj. + * @cursor: the cursor with the current position + * + * Returns the first fence from an unlocked dma_resv obj. + */ +struct dma_fence *dma_resv_iter_first_unlocked(struct dma_resv_iter *cursor) +{ + rcu_read_lock(); + do { + dma_resv_iter_restart_unlocked(cursor); + dma_resv_iter_walk_unlocked(cursor); + } while (read_seqcount_retry(&cursor->obj->seq, cursor->seq)); + rcu_read_unlock(); + + return cursor->fence; +} +EXPORT_SYMBOL(dma_resv_iter_first_unlocked); + +/** + * dma_resv_iter_next_unlocked - next fence in an unlocked dma_resv obj. + * @cursor: the cursor with the current position + * + * Returns the next fence from an unlocked dma_resv obj. + */ +struct dma_fence *dma_resv_iter_next_unlocked(struct dma_resv_iter *cursor) +{ + bool restart; + + rcu_read_lock(); + cursor->is_restarted = false; + restart = read_seqcount_retry(&cursor->obj->seq, cursor->seq); + do { + if (restart) + dma_resv_iter_restart_unlocked(cursor); + dma_resv_iter_walk_unlocked(cursor); + restart = true; + } while (read_seqcount_retry(&cursor->obj->seq, cursor->seq)); + rcu_read_unlock(); + + return cursor->fence; +} +EXPORT_SYMBOL(dma_resv_iter_next_unlocked); + /** * dma_resv_copy_fences - Copy all fences from src to dst. * @dst: the destination reservation object diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 9100dd3dc21f..5d7d28cb9008 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -149,6 +149,101 @@ struct dma_resv { struct dma_resv_list __rcu *fence; }; +/** + * struct dma_resv_iter - current position into the dma_resv fences + * + * Don't touch this directly in the driver, use the accessor function instead. + */ +struct dma_resv_iter { + /** @obj: The dma_resv object we iterate over */ + struct dma_resv *obj; + + /** @all_fences: If all fences should be returned */ + bool all_fences; + + /** @fence: the currently handled fence */ + struct dma_fence *fence; + + /** @seq: sequence number to check for modifications */ + unsigned int seq; + + /** @index: index into the shared fences */ + unsigned int index; + + /** @fences: the shared fences */ + struct dma_resv_list *fences; + + /** @is_restarted: true if this is the first returned fence */ + bool is_restarted; +}; + +struct dma_fence *dma_resv_iter_first_unlocked(struct dma_resv_iter *cursor); +struct dma_fence *dma_resv_iter_next_unlocked(struct dma_resv_iter *cursor); + +/** + * dma_resv_iter_begin - initialize a dma_resv_iter object + * @cursor: The dma_resv_iter object to initialize + * @obj: The dma_resv object which we want to iterate over + * @all_fences: If all fences should be returned or just the exclusive one + */ +static inline void dma_resv_iter_begin(struct dma_resv_iter *cursor, + struct dma_resv *obj, + bool all_fences) +{ + cursor->obj = obj; + cursor->all_fences = all_fences; + cursor->fence = NULL; +} + +/** + * dma_resv_iter_end - cleanup a dma_resv_iter object + * @cursor: the dma_resv_iter object which should be cleaned up + * + * Make sure that the reference to the fence in the cursor is properly + * dropped. + */ +static inline void dma_resv_iter_end(struct dma_resv_iter *cursor) +{ + dma_fence_put(cursor->fence); +} + +/** + * dma_resv_iter_is_exclusive - test if the current fence is the exclusive one + * @cursor: the cursor of the current position + * + * Returns true if the currently returned fence is the exclusive one. + */ +static inline bool dma_resv_iter_is_exclusive(struct dma_resv_iter *cursor) +{ + return cursor->index == -1; +} + +/** + * dma_resv_iter_is_restarted - test if this is the first fence after a restart + * @cursor: the cursor with the current position + * + * Return true if this is the first fence in an iteration after a restart. + */ +static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor) +{ + return cursor->is_restarted; +} + +/** + * dma_resv_for_each_fence_unlocked - unlocked fence iterator + * @cursor: a struct dma_resv_iter pointer + * @fence: the current fence + * + * Iterate over the fences in a struct dma_resv object without holding the + * &dma_resv.lock and using RCU instead. The cursor needs to be initialized + * with dma_resv_iter_begin() and cleaned up with dma_resv_iter_end(). Inside + * the iterator a reference to the dma_fence is held and the RCU lock dropped. + * When the dma_resv is modified the iteration starts over again. + */ +#define dma_resv_for_each_fence_unlocked(cursor, fence) \ + for (fence = dma_resv_iter_first_unlocked(cursor); \ + fence; fence = dma_resv_iter_next_unlocked(cursor)) + #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base) #define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base) -- 2.25.1

4 years, 4 months

1
26
0 0

Re: [Linaro-mm-sig] [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)

by Christian König

Am 23.09.21 um 02:58 schrieb Dave Airlie: > On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon(a)baylibre.com> wrote: >> Some Mediatek SoC provides hardware accelerator for AI / ML. >> This driver provides the infrastructure to manage memory >> shared between host CPU and the accelerator, and to submit >> jobs to the accelerator. >> The APU itself is managed by remoteproc so this drivers >> relies on remoteproc to found the APU and get some important data >> from it. But, the driver is quite generic and it should possible >> to manage accelerator using another ways. >> This driver doesn't manage itself the data transmitions. >> It must be registered by another driver implementing the transmitions. >> >> Signed-off-by: Alexandre Bailon <abailon(a)baylibre.com> >> [SNIP] >> Please refer to >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kerne… >> >> here and below in many places. >> >> There's a lot of missing padding/alignment here. There is also the pahole utility which show you nicely where you need padding for your IOCTL structures. For example "pahole drivers/gpu/drm/amd/amdgpu/amdgpu.ko -C drm_amdgpu_gem_va" gives you: struct drm_amdgpu_gem_va { __u32 handle; /* 0 4 */ __u32 _pad; /* 4 4 */ __u32 operation; /* 8 4 */ __u32 flags; /* 12 4 */ __u64 va_address; /* 16 8 */ __u64 offset_in_bo; /* 24 8 */ __u64 map_size; /* 32 8 */ /* size: 40, cachelines: 1, members: 7 */ /* last cacheline: 40 bytes */ }; And as you can see we have added the _pad field to our IOCTL parameter structure to properly align the 64bit members. Regards, Christian. >> >> I'm trying to find the time to review this stack in full, any writeups >> on how this is used from userspace would be useful (not just the code >> repo, but some sort of how do I get at it) it reads as kinda generic >> (calling it apu), but then has some specifics around device binding. >> >> Dave.

4 years, 4 months

1
0
0 0

Deploying new iterator interface for dma-buf

by Christian König

Hi guys, The version I've send out yesterday had a rather obvious coding error and I honestly forgot the cover letter. This one here is better tested and will now hopefully not be torn apart from the CI system immediately. I tried to address all review and documentation comments as best as I could, so I'm hoping that we can now considering pushing this. Cheers, Christian.

4 years, 4 months

2
40
0 0

Re: [Linaro-mm-sig] [PATCH v3 4/9] drm/scheduler: Add fence deadline support

by Rob Clark

On Wed, Sep 22, 2021 at 7:31 AM Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> wrote: > > > On 2021-09-21 11:32 p.m., Rob Clark wrote: > > On Tue, Sep 21, 2021 at 7:18 PM Andrey Grodzovsky > > <andrey.grodzovsky(a)amd.com> wrote: > >> > >> On 2021-09-21 4:47 p.m., Rob Clark wrote: > >>> On Tue, Sep 21, 2021 at 1:09 PM Andrey Grodzovsky > >>> <andrey.grodzovsky(a)amd.com> wrote: > >>>> On 2021-09-03 2:47 p.m., Rob Clark wrote: > >>>> > >>>>> From: Rob Clark <robdclark(a)chromium.org> > >>>>> > >>>>> As the finished fence is the one that is exposed to userspace, and > >>>>> therefore the one that other operations, like atomic update, would > >>>>> block on, we need to propagate the deadline from from the finished > >>>>> fence to the actual hw fence. > >>>>> > >>>>> v2: Split into drm_sched_fence_set_parent() (ckoenig) > >>>>> > >>>>> Signed-off-by: Rob Clark <robdclark(a)chromium.org> > >>>>> --- > >>>>> drivers/gpu/drm/scheduler/sched_fence.c | 34 +++++++++++++++++++++++++ > >>>>> drivers/gpu/drm/scheduler/sched_main.c | 2 +- > >>>>> include/drm/gpu_scheduler.h | 8 ++++++ > >>>>> 3 files changed, 43 insertions(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c > >>>>> index bcea035cf4c6..4fc41a71d1c7 100644 > >>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c > >>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c > >>>>> @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) > >>>>> dma_fence_put(&fence->scheduled); > >>>>> } > >>>>> > >>>>> +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f, > >>>>> + ktime_t deadline) > >>>>> +{ > >>>>> + struct drm_sched_fence *fence = to_drm_sched_fence(f); > >>>>> + unsigned long flags; > >>>>> + > >>>>> + spin_lock_irqsave(&fence->lock, flags); > >>>>> + > >>>>> + /* If we already have an earlier deadline, keep it: */ > >>>>> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) && > >>>>> + ktime_before(fence->deadline, deadline)) { > >>>>> + spin_unlock_irqrestore(&fence->lock, flags); > >>>>> + return; > >>>>> + } > >>>>> + > >>>>> + fence->deadline = deadline; > >>>>> + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags); > >>>>> + > >>>>> + spin_unlock_irqrestore(&fence->lock, flags); > >>>>> + > >>>>> + if (fence->parent) > >>>>> + dma_fence_set_deadline(fence->parent, deadline); > >>>>> +} > >>>>> + > >>>>> static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { > >>>>> .get_driver_name = drm_sched_fence_get_driver_name, > >>>>> .get_timeline_name = drm_sched_fence_get_timeline_name, > >>>>> @@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { > >>>>> .get_driver_name = drm_sched_fence_get_driver_name, > >>>>> .get_timeline_name = drm_sched_fence_get_timeline_name, > >>>>> .release = drm_sched_fence_release_finished, > >>>>> + .set_deadline = drm_sched_fence_set_deadline_finished, > >>>>> }; > >>>>> > >>>>> struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) > >>>>> @@ -152,6 +177,15 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) > >>>>> } > >>>>> EXPORT_SYMBOL(to_drm_sched_fence); > >>>>> > >>>>> +void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, > >>>>> + struct dma_fence *fence) > >>>>> +{ > >>>>> + s_fence->parent = dma_fence_get(fence); > >>>>> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, > >>>>> + &s_fence->finished.flags)) > >>>>> + dma_fence_set_deadline(fence, s_fence->deadline); > >>>> I believe above you should pass be s_fence->finished to > >>>> dma_fence_set_deadline > >>>> instead it fence which is the HW fence itself. > >>> Hmm, unless this has changed recently with some patches I don't have, > >>> s_fence->parent is the one signalled by hw, so it is the one we want > >>> to set the deadline on > >>> > >>> BR, > >>> -R > >> > >> No it didn't change. But then when exactly will > >> drm_sched_fence_set_deadline_finished > >> execute such that fence->parent != NULL ? In other words, I am not clear > >> how propagation > >> happens otherwise - if dma_fence_set_deadline is called with the HW > >> fence then the assumption > >> here is that driver provided driver specific > >> dma_fence_ops.dma_fence_set_deadline callback executes > >> but I was under impression that drm_sched_fence_set_deadline_finished is > >> the one that propagates > >> the deadline to the HW fence's callback and for it to execute > >> dma_fence_set_deadline needs to be called > >> with s_fence->finished. > > Assuming I didn't screw up drm/msm conversion to scheduler, > > &s_fence->finished is the one that will be returned to userspace.. and > > later passed back to kernel for atomic commit (or to the compositor). > > So it is the one that fence->set_deadline() will be called on. But > > s_fence->parent is the actual hw fence that needs to know about the > > deadline. Depending on whether or not the job has been written into > > hw ringbuffer or not, there are two cases: > > > > 1) not scheduled yet, s_fence will store the deadline and propagate it > > later once s_fence->parent is known > > > And by later you mean the call to drm_sched_fence_set_parent > after HW fence is returned ? If yes I think i get it now. Yup :-) BR, -R > Andrey > > > > 2) already scheduled, in which case s_fence->finished.set_deadline > > will propagate it directly to the real fence > > > > BR, > > -R > > > >> Andrey > >> > >> > >> > >>>> Andrey > >>>> > >>>> > >>>>> +} > >>>>> + > >>>>> struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity, > >>>>> void *owner) > >>>>> { > >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > >>>>> index 595e47ff7d06..27bf0ac0625f 100644 > >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c > >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c > >>>>> @@ -978,7 +978,7 @@ static int drm_sched_main(void *param) > >>>>> drm_sched_fence_scheduled(s_fence); > >>>>> > >>>>> if (!IS_ERR_OR_NULL(fence)) { > >>>>> - s_fence->parent = dma_fence_get(fence); > >>>>> + drm_sched_fence_set_parent(s_fence, fence); > >>>>> r = dma_fence_add_callback(fence, &sched_job->cb, > >>>>> drm_sched_job_done_cb); > >>>>> if (r == -ENOENT) > >>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > >>>>> index 7f77a455722c..158ddd662469 100644 > >>>>> --- a/include/drm/gpu_scheduler.h > >>>>> +++ b/include/drm/gpu_scheduler.h > >>>>> @@ -238,6 +238,12 @@ struct drm_sched_fence { > >>>>> */ > >>>>> struct dma_fence finished; > >>>>> > >>>>> + /** > >>>>> + * @deadline: deadline set on &drm_sched_fence.finished which > >>>>> + * potentially needs to be propagated to &drm_sched_fence.parent > >>>>> + */ > >>>>> + ktime_t deadline; > >>>>> + > >>>>> /** > >>>>> * @parent: the fence returned by &drm_sched_backend_ops.run_job > >>>>> * when scheduling the job on hardware. We signal the > >>>>> @@ -505,6 +511,8 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity, > >>>>> enum drm_sched_priority priority); > >>>>> bool drm_sched_entity_is_ready(struct drm_sched_entity *entity); > >>>>> > >>>>> +void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, > >>>>> + struct dma_fence *fence); > >>>>> struct drm_sched_fence *drm_sched_fence_alloc( > >>>>> struct drm_sched_entity *s_entity, void *owner); > >>>>> void drm_sched_fence_init(struct drm_sched_fence *fence,

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v3 4/9] drm/scheduler: Add fence deadline support

by Rob Clark

On Tue, Sep 21, 2021 at 7:18 PM Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> wrote: > > > On 2021-09-21 4:47 p.m., Rob Clark wrote: > > On Tue, Sep 21, 2021 at 1:09 PM Andrey Grodzovsky > > <andrey.grodzovsky(a)amd.com> wrote: > >> On 2021-09-03 2:47 p.m., Rob Clark wrote: > >> > >>> From: Rob Clark <robdclark(a)chromium.org> > >>> > >>> As the finished fence is the one that is exposed to userspace, and > >>> therefore the one that other operations, like atomic update, would > >>> block on, we need to propagate the deadline from from the finished > >>> fence to the actual hw fence. > >>> > >>> v2: Split into drm_sched_fence_set_parent() (ckoenig) > >>> > >>> Signed-off-by: Rob Clark <robdclark(a)chromium.org> > >>> --- > >>> drivers/gpu/drm/scheduler/sched_fence.c | 34 +++++++++++++++++++++++++ > >>> drivers/gpu/drm/scheduler/sched_main.c | 2 +- > >>> include/drm/gpu_scheduler.h | 8 ++++++ > >>> 3 files changed, 43 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c > >>> index bcea035cf4c6..4fc41a71d1c7 100644 > >>> --- a/drivers/gpu/drm/scheduler/sched_fence.c > >>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c > >>> @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) > >>> dma_fence_put(&fence->scheduled); > >>> } > >>> > >>> +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f, > >>> + ktime_t deadline) > >>> +{ > >>> + struct drm_sched_fence *fence = to_drm_sched_fence(f); > >>> + unsigned long flags; > >>> + > >>> + spin_lock_irqsave(&fence->lock, flags); > >>> + > >>> + /* If we already have an earlier deadline, keep it: */ > >>> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) && > >>> + ktime_before(fence->deadline, deadline)) { > >>> + spin_unlock_irqrestore(&fence->lock, flags); > >>> + return; > >>> + } > >>> + > >>> + fence->deadline = deadline; > >>> + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags); > >>> + > >>> + spin_unlock_irqrestore(&fence->lock, flags); > >>> + > >>> + if (fence->parent) > >>> + dma_fence_set_deadline(fence->parent, deadline); > >>> +} > >>> + > >>> static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { > >>> .get_driver_name = drm_sched_fence_get_driver_name, > >>> .get_timeline_name = drm_sched_fence_get_timeline_name, > >>> @@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { > >>> .get_driver_name = drm_sched_fence_get_driver_name, > >>> .get_timeline_name = drm_sched_fence_get_timeline_name, > >>> .release = drm_sched_fence_release_finished, > >>> + .set_deadline = drm_sched_fence_set_deadline_finished, > >>> }; > >>> > >>> struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) > >>> @@ -152,6 +177,15 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) > >>> } > >>> EXPORT_SYMBOL(to_drm_sched_fence); > >>> > >>> +void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, > >>> + struct dma_fence *fence) > >>> +{ > >>> + s_fence->parent = dma_fence_get(fence); > >>> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, > >>> + &s_fence->finished.flags)) > >>> + dma_fence_set_deadline(fence, s_fence->deadline); > >> > >> I believe above you should pass be s_fence->finished to > >> dma_fence_set_deadline > >> instead it fence which is the HW fence itself. > > Hmm, unless this has changed recently with some patches I don't have, > > s_fence->parent is the one signalled by hw, so it is the one we want > > to set the deadline on > > > > BR, > > -R > > > No it didn't change. But then when exactly will > drm_sched_fence_set_deadline_finished > execute such that fence->parent != NULL ? In other words, I am not clear > how propagation > happens otherwise - if dma_fence_set_deadline is called with the HW > fence then the assumption > here is that driver provided driver specific > dma_fence_ops.dma_fence_set_deadline callback executes > but I was under impression that drm_sched_fence_set_deadline_finished is > the one that propagates > the deadline to the HW fence's callback and for it to execute > dma_fence_set_deadline needs to be called > with s_fence->finished. Assuming I didn't screw up drm/msm conversion to scheduler, &s_fence->finished is the one that will be returned to userspace.. and later passed back to kernel for atomic commit (or to the compositor). So it is the one that fence->set_deadline() will be called on. But s_fence->parent is the actual hw fence that needs to know about the deadline. Depending on whether or not the job has been written into hw ringbuffer or not, there are two cases: 1) not scheduled yet, s_fence will store the deadline and propagate it later once s_fence->parent is known 2) already scheduled, in which case s_fence->finished.set_deadline will propagate it directly to the real fence BR, -R > Andrey > > > > > > >> Andrey > >> > >> > >>> +} > >>> + > >>> struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity, > >>> void *owner) > >>> { > >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > >>> index 595e47ff7d06..27bf0ac0625f 100644 > >>> --- a/drivers/gpu/drm/scheduler/sched_main.c > >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c > >>> @@ -978,7 +978,7 @@ static int drm_sched_main(void *param) > >>> drm_sched_fence_scheduled(s_fence); > >>> > >>> if (!IS_ERR_OR_NULL(fence)) { > >>> - s_fence->parent = dma_fence_get(fence); > >>> + drm_sched_fence_set_parent(s_fence, fence); > >>> r = dma_fence_add_callback(fence, &sched_job->cb, > >>> drm_sched_job_done_cb); > >>> if (r == -ENOENT) > >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > >>> index 7f77a455722c..158ddd662469 100644 > >>> --- a/include/drm/gpu_scheduler.h > >>> +++ b/include/drm/gpu_scheduler.h > >>> @@ -238,6 +238,12 @@ struct drm_sched_fence { > >>> */ > >>> struct dma_fence finished; > >>> > >>> + /** > >>> + * @deadline: deadline set on &drm_sched_fence.finished which > >>> + * potentially needs to be propagated to &drm_sched_fence.parent > >>> + */ > >>> + ktime_t deadline; > >>> + > >>> /** > >>> * @parent: the fence returned by &drm_sched_backend_ops.run_job > >>> * when scheduling the job on hardware. We signal the > >>> @@ -505,6 +511,8 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity, > >>> enum drm_sched_priority priority); > >>> bool drm_sched_entity_is_ready(struct drm_sched_entity *entity); > >>> > >>> +void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, > >>> + struct dma_fence *fence); > >>> struct drm_sched_fence *drm_sched_fence_alloc( > >>> struct drm_sched_entity *s_entity, void *owner); > >>> void drm_sched_fence_init(struct drm_sched_fence *fence,

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v3 4/9] drm/scheduler: Add fence deadline support

by Rob Clark

On Tue, Sep 21, 2021 at 1:09 PM Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> wrote: > > On 2021-09-03 2:47 p.m., Rob Clark wrote: > > > From: Rob Clark <robdclark(a)chromium.org> > > > > As the finished fence is the one that is exposed to userspace, and > > therefore the one that other operations, like atomic update, would > > block on, we need to propagate the deadline from from the finished > > fence to the actual hw fence. > > > > v2: Split into drm_sched_fence_set_parent() (ckoenig) > > > > Signed-off-by: Rob Clark <robdclark(a)chromium.org> > > --- > > drivers/gpu/drm/scheduler/sched_fence.c | 34 +++++++++++++++++++++++++ > > drivers/gpu/drm/scheduler/sched_main.c | 2 +- > > include/drm/gpu_scheduler.h | 8 ++++++ > > 3 files changed, 43 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c > > index bcea035cf4c6..4fc41a71d1c7 100644 > > --- a/drivers/gpu/drm/scheduler/sched_fence.c > > +++ b/drivers/gpu/drm/scheduler/sched_fence.c > > @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) > > dma_fence_put(&fence->scheduled); > > } > > > > +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f, > > + ktime_t deadline) > > +{ > > + struct drm_sched_fence *fence = to_drm_sched_fence(f); > > + unsigned long flags; > > + > > + spin_lock_irqsave(&fence->lock, flags); > > + > > + /* If we already have an earlier deadline, keep it: */ > > + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) && > > + ktime_before(fence->deadline, deadline)) { > > + spin_unlock_irqrestore(&fence->lock, flags); > > + return; > > + } > > + > > + fence->deadline = deadline; > > + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags); > > + > > + spin_unlock_irqrestore(&fence->lock, flags); > > + > > + if (fence->parent) > > + dma_fence_set_deadline(fence->parent, deadline); > > +} > > + > > static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { > > .get_driver_name = drm_sched_fence_get_driver_name, > > .get_timeline_name = drm_sched_fence_get_timeline_name, > > @@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { > > .get_driver_name = drm_sched_fence_get_driver_name, > > .get_timeline_name = drm_sched_fence_get_timeline_name, > > .release = drm_sched_fence_release_finished, > > + .set_deadline = drm_sched_fence_set_deadline_finished, > > }; > > > > struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) > > @@ -152,6 +177,15 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) > > } > > EXPORT_SYMBOL(to_drm_sched_fence); > > > > +void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, > > + struct dma_fence *fence) > > +{ > > + s_fence->parent = dma_fence_get(fence); > > + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, > > + &s_fence->finished.flags)) > > + dma_fence_set_deadline(fence, s_fence->deadline); > > > I believe above you should pass be s_fence->finished to > dma_fence_set_deadline > instead it fence which is the HW fence itself. Hmm, unless this has changed recently with some patches I don't have, s_fence->parent is the one signalled by hw, so it is the one we want to set the deadline on BR, -R > Andrey > > > > +} > > + > > struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity, > > void *owner) > > { > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > > index 595e47ff7d06..27bf0ac0625f 100644 > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > @@ -978,7 +978,7 @@ static int drm_sched_main(void *param) > > drm_sched_fence_scheduled(s_fence); > > > > if (!IS_ERR_OR_NULL(fence)) { > > - s_fence->parent = dma_fence_get(fence); > > + drm_sched_fence_set_parent(s_fence, fence); > > r = dma_fence_add_callback(fence, &sched_job->cb, > > drm_sched_job_done_cb); > > if (r == -ENOENT) > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > > index 7f77a455722c..158ddd662469 100644 > > --- a/include/drm/gpu_scheduler.h > > +++ b/include/drm/gpu_scheduler.h > > @@ -238,6 +238,12 @@ struct drm_sched_fence { > > */ > > struct dma_fence finished; > > > > + /** > > + * @deadline: deadline set on &drm_sched_fence.finished which > > + * potentially needs to be propagated to &drm_sched_fence.parent > > + */ > > + ktime_t deadline; > > + > > /** > > * @parent: the fence returned by &drm_sched_backend_ops.run_job > > * when scheduling the job on hardware. We signal the > > @@ -505,6 +511,8 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity, > > enum drm_sched_priority priority); > > bool drm_sched_entity_is_ready(struct drm_sched_entity *entity); > > > > +void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, > > + struct dma_fence *fence); > > struct drm_sched_fence *drm_sched_fence_alloc( > > struct drm_sched_entity *s_entity, void *owner); > > void drm_sched_fence_init(struct drm_sched_fence *fence,

4 years, 4 months

1
0
0 0

[PATCH 01/26] dma-buf: add dma_resv_for_each_fence_unlocked v3

by Christian König

Abstract the complexity of iterating over all the fences in a dma_resv object. The new loop handles the whole RCU and retry dance and returns only fences where we can be sure we grabbed the right one. v2: fix accessing the shared fences while they might be freed, improve kerneldoc, rename _cursor to _iter, add dma_resv_iter_is_exclusive, add dma_resv_iter_begin/end v3: restructor the code, move rcu_read_lock()/unlock() into the iterator, add dma_resv_iter_is_restarted() Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/dma-resv.c | 98 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-resv.h | 95 ++++++++++++++++++++++++++++++++++++ 2 files changed, 193 insertions(+) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 84fbe60629e3..11b5399f4bd3 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -323,6 +323,104 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) } EXPORT_SYMBOL(dma_resv_add_excl_fence); +/** + * dma_resv_iter_restart_unlocked - restart the unlocked iterator + * @cursor: The dma_resv_iter object to restart + * + * Restart the unlocked iteration by initializing the cursor object. + */ +static void dma_resv_iter_restart_unlocked(struct dma_resv_iter *cursor) +{ + cursor->seq = read_seqcount_begin(&cursor->obj->seq); + cursor->index = -1; + if (cursor->all_fences) { + rcu_read_lock(); + cursor->fences = dma_resv_shared_list(cursor->obj); + rcu_read_unlock(); + } else { + cursor->fences = NULL; + } + cursor->is_restarted = true; +} + +/** + * dma_resv_iter_walk_unlocked - walk over fences in a dma_resv obj + * @cursor: cursor to record the current position + * + * Return all the fences in the dma_resv object which are not yet signaled. + * The returned fence has an extra local reference so will stay alive. + * If a concurrent modify is detected the whole iterration is started over again. + */ +static void dma_resv_iter_walk_unlocked(struct dma_resv_iter *cursor) +{ + struct dma_resv *obj = cursor->obj; + + do { + /* Drop the reference from the previous round */ + dma_fence_put(cursor->fence); + + if (cursor->index++ == -1) { + cursor->fence = dma_resv_excl_fence(obj); + cursor->fence = dma_fence_get_rcu(cursor->fence); + + } else if (!cursor->fences || + cursor->index >= cursor->fences->shared_count) { + cursor->fence = NULL; + + } else { + struct dma_resv_list *fences = cursor->fences; + unsigned int idx = cursor->index; + + cursor->fence = rcu_dereference(fences->shared[idx]); + cursor->fence = dma_fence_get_rcu(cursor->fence); + } + } while (cursor->fence && dma_fence_is_signaled(cursor->fence)); +} + +/** + * dma_resv_iter_first_unlocked - first fence in an unlocked dma_resv obj. + * @cursor: the cursor with the current position + * + * Returns the first fence from an unlocked dma_resv obj. + */ +struct dma_fence *dma_resv_iter_first_unlocked(struct dma_resv_iter *cursor) +{ + rcu_read_lock(); + do { + dma_resv_iter_restart_unlocked(cursor); + dma_resv_iter_walk_unlocked(cursor); + } while (read_seqcount_retry(&cursor->obj->seq, cursor->seq)); + rcu_read_unlock(); + + return cursor->fence; +} +EXPORT_SYMBOL(dma_resv_iter_first_unlocked); + +/** + * dma_resv_iter_next_unlocked - next fence in an unlocked dma_resv obj. + * @cursor: the cursor with the current position + * + * Returns the next fence from an unlocked dma_resv obj. + */ +struct dma_fence *dma_resv_iter_next_unlocked(struct dma_resv_iter *cursor) +{ + bool restart; + + rcu_read_lock(); + cursor->is_restarted = false; + restart = read_seqcount_retry(&cursor->obj->seq, cursor->seq); + do { + if (restart) + dma_resv_iter_restart_unlocked(cursor); + dma_resv_iter_walk_unlocked(cursor); + restart = true; + } while (read_seqcount_retry(&cursor->obj->seq, cursor->seq)); + rcu_read_unlock(); + + return cursor->fence; +} +EXPORT_SYMBOL(dma_resv_iter_next_unlocked); + /** * dma_resv_copy_fences - Copy all fences from src to dst. * @dst: the destination reservation object diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 9100dd3dc21f..baf77a542392 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -149,6 +149,101 @@ struct dma_resv { struct dma_resv_list __rcu *fence; }; +/** + * struct dma_resv_iter - current position into the dma_resv fences + * + * Don't touch this directly in the driver, use the accessor function instead. + */ +struct dma_resv_iter { + /** @obj: The dma_resv object we iterate over */ + struct dma_resv *obj; + + /** @all_fences: If all fences should be returned */ + bool all_fences; + + /** @fence: the currently handled fence */ + struct dma_fence *fence; + + /** @seq: sequence number to check for modifications */ + unsigned int seq; + + /** @index: index into the shared fences */ + unsigned int index; + + /** @fences: the shared fences */ + struct dma_resv_list *fences; + + /** @is_restarted: true if this is the first returned fence */ + bool is_restarted; +}; + +struct dma_fence *dma_resv_iter_first_unlocked(struct dma_resv_iter *cursor); +struct dma_fence *dma_resv_iter_next_unlocked(struct dma_resv_iter *cursor); + +/** + * dma_resv_iter_begin - initialize a dma_resv_iter object + * @cursor: The dma_resv_iter object to initialize + * @obj: The dma_resv object which we want to iterator over + * @all_fences: If all fences should be returned or just the exclusive one + */ +static inline void dma_resv_iter_begin(struct dma_resv_iter *cursor, + struct dma_resv *obj, + bool all_fences) +{ + cursor->obj = obj; + cursor->all_fences = all_fences; + cursor->fence = NULL; +} + +/** + * dma_resv_iter_end - cleanup a dma_resv_iter object + * @cursor: the dma_resv_iter object which should be cleaned up + * + * Make sure that the reference to the fence in the cursor is properly + * dropped. + */ +static inline void dma_resv_iter_end(struct dma_resv_iter *cursor) +{ + dma_fence_put(cursor->fence); +} + +/** + * dma_resv_iter_is_exclusive - test if the current fence is the exclusive one + * @cursor: the cursor of the current position + * + * Returns true if the currently returned fence is the exclusive one. + */ +static inline bool dma_resv_iter_is_exclusive(struct dma_resv_iter *cursor) +{ + return cursor->index == -1; +} + +/** + * dma_resv_iter_is_restarted - test if this is the first fence after a restart + * @cursor: the cursor with the current position + * + * Return true if this is the first fence in an interation after a restart. + */ +static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor) +{ + return cursor->is_restarted; +} + +/** + * dma_resv_for_each_fence_unlocked - unlocked fence iterator + * @cursor: a struct dma_resv_iter pointer + * @fence: the current fence + * + * Iterate over the fences in a struct dma_resv object without holding the + * &dma_resv.lock and using RCU instead. The cursor needs to be initialized + * with dma_resv_iter_begin() and cleaned up with dma_resv_iter_end(). Inside + * the iterator a reference to the dma_fence is hold and the RCU lock dropped. + * When the dma_resv is modified the iteration starts over again. + */ +#define dma_resv_for_each_fence_unlocked(cursor, fence) \ + for (fence = dma_resv_iter_first_unlocked(cursor); \ + fence; fence = dma_resv_iter_next_unlocked(cursor)) + #define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base) #define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base) -- 2.25.1

4 years, 4 months

1
25
0 0

Deploying new iterator interface for dma-buf

by Christian König

Hopefully the last round for this. Added dma_resv_iter_begin/end as requested by Daniel. Fixed a bunch of problems pointed out by the CI systems and found a few more myselve. Please review and/or comment, Christian.

4 years, 4 months

3
55
0 0

Re: [Linaro-mm-sig] [RFC PATCH 1/4] dt-bindings: Add bidings for mtk, apu-drm

by Rob Herring

On Fri, Sep 17, 2021 at 02:59:42PM +0200, Alexandre Bailon wrote: > This adds the device tree bindings for the APU DRM driver. > > Signed-off-by: Alexandre Bailon <abailon(a)baylibre.com> > --- > .../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++ > 1 file changed, 38 insertions(+) > create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml > > diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml > new file mode 100644 > index 0000000000000..6f432d3ea478c > --- /dev/null > +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml > @@ -0,0 +1,38 @@ > +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: AI Processor Unit DRM DRM is a linux thing, not h/w. > + > +properties: > + compatible: > + const: mediatek,apu-drm > + > + remoteproc: So is remoteproc. Why don't you have the remoteproc driver create the DRM device? > + maxItems: 2 > + description: > + Handle to remoteproc devices controlling the APU > + > + iova: > + maxItems: 1 > + description: > + Address and size of virtual memory that could used by the APU Why does this need to be in DT? If you need to reserve certain VAs, then this discussion[1] might be of interest. Rob [1] https://lore.kernel.org/all/YUIPCxnyRutMS47%2F@orome.fritz.box/

4 years, 4 months

1
0
0 0

[PATCH] panfrost: make mediatek_mt8183_supplies and mediatek_mt8183_pm_domains static

by Jiapeng Chong

This symbol is not used outside of panfrost_drv.c, so marks it static. Fix the following sparse warning: drivers/gpu/drm/panfrost/panfrost_drv.c:641:12: warning: symbol 'mediatek_mt8183_supplies' was not declared. Should it be static? drivers/gpu/drm/panfrost/panfrost_drv.c:642:12: warning: symbol 'mediatek_mt8183_pm_domains' was not declared. Should it be static? Reported-by: Abaci Robot <abaci(a)linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com> --- drivers/gpu/drm/panfrost/panfrost_drv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 077cbbf..82ad9a6 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -638,8 +638,8 @@ static int panfrost_remove(struct platform_device *pdev) .vendor_quirk = panfrost_gpu_amlogic_quirk, }; -const char * const mediatek_mt8183_supplies[] = { "mali", "sram" }; -const char * const mediatek_mt8183_pm_domains[] = { "core0", "core1", "core2" }; +static const char * const mediatek_mt8183_supplies[] = { "mali", "sram" }; +static const char * const mediatek_mt8183_pm_domains[] = { "core0", "core1", "core2" }; static const struct panfrost_compatible mediatek_mt8183_data = { .num_supplies = ARRAY_SIZE(mediatek_mt8183_supplies), .supply_names = mediatek_mt8183_supplies, -- 1.8.3.1

4 years, 4 months

1
0
0 0

Re: [Linaro-mm-sig] [RFC PATCH 1/4] dt-bindings: Add bidings for mtk, apu-drm

by Rob Herring

On Fri, 17 Sep 2021 14:59:42 +0200, Alexandre Bailon wrote: > This adds the device tree bindings for the APU DRM driver. > > Signed-off-by: Alexandre Bailon <abailon(a)baylibre.com> > --- > .../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++ > 1 file changed, 38 insertions(+) > create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' on your patch (DT_CHECKER_FLAGS is new in v5.13): yamllint warnings/errors: dtschema/dtc warnings/errors: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: 'maintainers' is a required property hint: Metaschema for devicetree binding documentation from schema $id: http://devicetree.org/meta-schemas/base.yaml# ./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: $id: relative path/filename doesn't match actual path or filename expected: http://devicetree.org/schemas/gpu/mtk,apu-drm.yaml# /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: ignoring, error in schema: warning: no schema found in file: ./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dts:19.15-23.11: Warning (unit_address_vs_reg): /example-0/apu@0: node has a unit name, but no reg or ranges property Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dt.yaml:0:0: /example-0/apu@0: failed to match any schema with compatible: ['mediatek,apu-drm'] doc reference errors (make refcheckdocs): See https://patchwork.ozlabs.org/patch/1529388 This check can fail if there are any dependencies. The base for a patch series is generally the most recent rc1. If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure 'yamllint' is installed and dt-schema is up to date: pip3 install dtschema --upgrade Please check and re-submit.

4 years, 5 months

1
0
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig