Linaro-mm-sig

linaro-mm-sig@lists.linaro.org

8 participants
2915 discussions

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 20/09/2024 23:36, Adrián Larumbe wrote: > Hi Steve, thanks for the review. Hi Adrián, > I've applied all of your suggestions for the next patch series revision, so I'll > only be answering to your question about the calc_profiling_ringbuf_num_slots > function further down below. > [...] >>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { >>> .free_job = queue_free_job, >>> }; >>> >>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>> + u32 cs_ringbuf_size) >>> +{ >>> + u32 min_profiled_job_instrs = U32_MAX; >>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >>> + >>> + /* >>> + * We want to calculate the minimum size of a profiled job's CS, >>> + * because since they need additional instructions for the sampling >>> + * of performance metrics, they might take up further slots in >>> + * the queue's ringbuffer. This means we might not need as many job >>> + * slots for keeping track of their profiling information. What we >>> + * need is the maximum number of slots we should allocate to this end, >>> + * which matches the maximum number of profiled jobs we can place >>> + * simultaneously in the queue's ring buffer. >>> + * That has to be calculated separately for every single job profiling >>> + * flag, but not in the case job profiling is disabled, since unprofiled >>> + * jobs don't need to keep track of this at all. >>> + */ >>> + for (u32 i = 0; i < last_flag; i++) { >>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >>> + min_profiled_job_instrs = >>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>> + } >>> + >>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>> +} >> >> I may be missing something, but is there a situation where this is >> different to calc_job_credits(0)? AFAICT the infrastructure you've added >> can only add extra instructions to the no-flags case - whereas this >> implies you're thinking that instructions may also be removed (or replaced). >> >> Steve > > Since we create a separate kernel BO to hold the profiling information slot, we > need one that would be able to accomodate as many slots as the maximum number of > profiled jobs we can insert simultaneously into the queue's ring buffer. Because > profiled jobs always take more instructions than unprofiled ones, then we would > usually need fewer slots than the number of unprofiled jobs we could insert at > once in the ring buffer. > > Because we represent profiling metrics with a bit mask, then we need to test the > size of the CS for every single metric enabled in isolation, since enabling more > than one will always mean a bigger CS, and therefore fewer jobs tracked at once > in the queue's ring buffer. > > In our case, calling calc_job_credits(0) would simply tell us the number of > instructions we need for a normal job with no profiled features enabled, which > would always requiere less instructions than profiled ones, and therefore more > slots in the profiling info kernel BO. But we don't need to keep track of > profiling numbers for unprofiled jobs, so there's no point in calculating this > number. > > At first I was simply allocating a profiling info kernel BO as big as the number > of simultaneous unprofiled job slots in the ring queue, but Boris pointed out > that since queue ringbuffers can be as big as 2GiB, a lot of this memory would > be wasted, since profiled jobs always require more slots because they hold more > instructions, so fewer profiling slots in said kernel BO. > > The value of this approach will eventually manifest if we decided to keep track of > more profiling metrics, since this code won't have to change at all, other than > adding new profiling flags in the panthor_device_profiling_flags enum. Thanks for the detailed explanation. I think what I was missing is that the loop is checking each bit flag independently and *not* checking calc_job_credits(0). The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what confused me - that should be completely redundant. Or at least we need something more intelligent if we have profiling bits which are not mutually compatible. I'm also not entirely sure that the amount of RAM saved is significant, but you've already written the code so we might as well have the saving ;) Thanks, Steve > Regards, > Adrian > >>> + >>> static struct panthor_queue * >>> group_create_queue(struct panthor_group *group, >>> const struct drm_panthor_queue_create *args) >>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, >>> goto err_free_queue; >>> } >>> >>> + queue->profiling.slot_count = >>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); >>> + >>> + queue->profiling.slots = >>> + panthor_kernel_bo_create(group->ptdev, group->vm, >>> + queue->profiling.slot_count * >>> + sizeof(struct panthor_job_profiling_data), >>> + DRM_PANTHOR_BO_NO_MMAP, >>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | >>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, >>> + PANTHOR_VM_KERNEL_AUTO_VA); >>> + >>> + if (IS_ERR(queue->profiling.slots)) { >>> + ret = PTR_ERR(queue->profiling.slots); >>> + goto err_free_queue; >>> + } >>> + >>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots); >>> + if (ret) >>> + goto err_free_queue; >>> + >>> + /* >>> + * Credit limit argument tells us the total number of instructions >>> + * across all CS slots in the ringbuffer, with some jobs requiring >>> + * twice as many as others, depending on their profiling status. >>> + */ >>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, >>> group->ptdev->scheduler->wq, 1, >>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), >>> + args->ringbuf_size / sizeof(u64), >>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS), >>> group->ptdev->reset.wq, >>> NULL, "panthor-queue", group->ptdev->base.dev); >>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, >>> { >>> struct panthor_group_pool *gpool = pfile->groups; >>> struct panthor_job *job; >>> + u32 credits; >>> int ret; >>> >>> if (qsubmit->pad) >>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, >>> } >>> } >>> >>> + job->profiling.mask = pfile->ptdev->profile_mask; >>> + credits = calc_job_credits(job->profiling.mask); >>> + if (credits == 0) { >>> + ret = -EINVAL; >>> + goto err_put_job; >>> + } >>> + >>> ret = drm_sched_job_init(&job->base, >>> &job->group->queues[job->queue_idx]->entity, >>> - 1, job->group); >>> + credits, job->group); >>> if (ret) >>> goto err_put_job; >>> >

9 months, 2 weeks

[RFC PATCH] dma-buf/dma-fence: Use a successful read_trylock() annotation for dma_fence_begin_signalling()

by Thomas Hellström

Condsider the following call sequence: /* Upper layer */ dma_fence_begin_signalling(); lock(tainted_shared_lock); /* Driver callback */ dma_fence_begin_signalling(); ... The driver might here use a utility that is annotated as intended for the dma-fence signalling critical path. Now if the upper layer isn't correctly annotated yet for whatever reason, resulting in /* Upper layer */ lock(tainted_shared_lock); /* Driver callback */ dma_fence_begin_signalling(); We will receive a false lockdep locking order violation notification from dma_fence_begin_signalling(). However entering a dma-fence signalling critical section itself doesn't block and could not cause a deadlock. So use a successful read_trylock() annotation instead for dma_fence_begin_signalling(). That will make sure that the locking order is correctly registered in the first case, and doesn't register any locking order in the second case. The alternative is of course to make sure that the "Upper layer" is always correctly annotated. But experience shows that's not easily achievable in all cases. Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> --- drivers/dma-buf/dma-fence.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index f177c56269bb..17f632768ef9 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -308,8 +308,8 @@ bool dma_fence_begin_signalling(void) if (in_atomic()) return true; - /* ... and non-recursive readlock */ - lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_); + /* ... and non-recursive successful read_trylock */ + lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _RET_IP_); return false; } @@ -340,7 +340,7 @@ void __dma_fence_might_wait(void) lock_map_acquire(&dma_fence_lockdep_map); lock_map_release(&dma_fence_lockdep_map); if (tmp) - lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_); + lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _THIS_IP_); } #endif -- 2.39.2

9 months, 2 weeks

[PATCH 1/2] dma-buf/dma-fence: remove unnecessary callbacks

by Christian König

The fence_value_str and timeline_value_str callbacks were just an unnecessary abstraction in the SW sync implementation. The only caller of those callbacks already knew that the fence in questions is a timeline_fence. So print the values directly instead of using a redirection. Additional to that remove the implementations from virtgpu and vgem. As far as I can see those were never used in the first place. Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/sw_sync.c | 16 ---------------- drivers/dma-buf/sync_debug.c | 21 ++------------------- drivers/gpu/drm/vgem/vgem_fence.c | 15 --------------- drivers/gpu/drm/virtio/virtgpu_fence.c | 16 ---------------- include/linux/dma-fence.h | 21 --------------------- 5 files changed, 2 insertions(+), 87 deletions(-) diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index c353029789cf..f7ce4c6b8b8e 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -178,20 +178,6 @@ static bool timeline_fence_enable_signaling(struct dma_fence *fence) return true; } -static void timeline_fence_value_str(struct dma_fence *fence, - char *str, int size) -{ - snprintf(str, size, "%lld", fence->seqno); -} - -static void timeline_fence_timeline_value_str(struct dma_fence *fence, - char *str, int size) -{ - struct sync_timeline *parent = dma_fence_parent(fence); - - snprintf(str, size, "%d", parent->value); -} - static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) { struct sync_pt *pt = dma_fence_to_sync_pt(fence); @@ -214,8 +200,6 @@ static const struct dma_fence_ops timeline_fence_ops = { .enable_signaling = timeline_fence_enable_signaling, .signaled = timeline_fence_signaled, .release = timeline_fence_release, - .fence_value_str = timeline_fence_value_str, - .timeline_value_str = timeline_fence_timeline_value_str, .set_deadline = timeline_fence_set_deadline, }; diff --git a/drivers/dma-buf/sync_debug.c b/drivers/dma-buf/sync_debug.c index 237bce21d1e7..270daae7d89a 100644 --- a/drivers/dma-buf/sync_debug.c +++ b/drivers/dma-buf/sync_debug.c @@ -82,25 +82,8 @@ static void sync_print_fence(struct seq_file *s, seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec); } - if (fence->ops->timeline_value_str && - fence->ops->fence_value_str) { - char value[64]; - bool success; - - fence->ops->fence_value_str(fence, value, sizeof(value)); - success = strlen(value); - - if (success) { - seq_printf(s, ": %s", value); - - fence->ops->timeline_value_str(fence, value, - sizeof(value)); - - if (strlen(value)) - seq_printf(s, " / %s", value); - } - } - + seq_printf(s, ": %lld", fence->seqno); + seq_printf(s, " / %d", parent->value); seq_putc(s, '\n'); } diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c index e15754178395..5298d995faa7 100644 --- a/drivers/gpu/drm/vgem/vgem_fence.c +++ b/drivers/gpu/drm/vgem/vgem_fence.c @@ -53,25 +53,10 @@ static void vgem_fence_release(struct dma_fence *base) dma_fence_free(&fence->base); } -static void vgem_fence_value_str(struct dma_fence *fence, char *str, int size) -{ - snprintf(str, size, "%llu", fence->seqno); -} - -static void vgem_fence_timeline_value_str(struct dma_fence *fence, char *str, - int size) -{ - snprintf(str, size, "%llu", - dma_fence_is_signaled(fence) ? fence->seqno : 0); -} - static const struct dma_fence_ops vgem_fence_ops = { .get_driver_name = vgem_fence_get_driver_name, .get_timeline_name = vgem_fence_get_timeline_name, .release = vgem_fence_release, - - .fence_value_str = vgem_fence_value_str, - .timeline_value_str = vgem_fence_timeline_value_str, }; static void vgem_fence_timeout(struct timer_list *t) diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c b/drivers/gpu/drm/virtio/virtgpu_fence.c index f28357dbde35..44c1d8ef3c4d 100644 --- a/drivers/gpu/drm/virtio/virtgpu_fence.c +++ b/drivers/gpu/drm/virtio/virtgpu_fence.c @@ -49,26 +49,10 @@ static bool virtio_gpu_fence_signaled(struct dma_fence *f) return false; } -static void virtio_gpu_fence_value_str(struct dma_fence *f, char *str, int size) -{ - snprintf(str, size, "[%llu, %llu]", f->context, f->seqno); -} - -static void virtio_gpu_timeline_value_str(struct dma_fence *f, char *str, - int size) -{ - struct virtio_gpu_fence *fence = to_virtio_gpu_fence(f); - - snprintf(str, size, "%llu", - (u64)atomic64_read(&fence->drv->last_fence_id)); -} - static const struct dma_fence_ops virtio_gpu_fence_ops = { .get_driver_name = virtio_gpu_get_driver_name, .get_timeline_name = virtio_gpu_get_timeline_name, .signaled = virtio_gpu_fence_signaled, - .fence_value_str = virtio_gpu_fence_value_str, - .timeline_value_str = virtio_gpu_timeline_value_str, }; struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device *vgdev, diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index e7ad819962e3..cf91cae6e30f 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -238,27 +238,6 @@ struct dma_fence_ops { */ void (*release)(struct dma_fence *fence); - /** - * @fence_value_str: - * - * Callback to fill in free-form debug info specific to this fence, like - * the sequence number. - * - * This callback is optional. - */ - void (*fence_value_str)(struct dma_fence *fence, char *str, int size); - - /** - * @timeline_value_str: - * - * Fills in the current value of the timeline as a string, like the - * sequence number. Note that the specific fence passed to this function - * should not matter, drivers should only use it to look up the - * corresponding timeline structures. - */ - void (*timeline_value_str)(struct dma_fence *fence, - char *str, int size); - /** * @set_deadline: * -- 2.34.1

9 months, 2 weeks

Re: [PATCH v4 2/2] mtd: rawnand: nuvoton: add new driver for the Nuvoton MA35 SoC

by Sascha Hauer

Hi, The driver has a few minor whitespace issues, please run through checkpatch.pl to catch them. Some more things inline. On Wed, Sep 18, 2024 at 09:03:08AM +0000, Hui-Ping Chen wrote: > Nuvoton MA35 SoCs NAND Flash Interface Controller > supports 2kiB, 4kiB and 8kiB page size, and up to > 8-bit, 12-bit, and 24-bit hardware ECC calculation > circuit to protect data. > > Signed-off-by: Hui-Ping Chen <hpchen0nvt(a)gmail.com> > --- > drivers/mtd/nand/raw/Kconfig | 8 + > drivers/mtd/nand/raw/Makefile | 1 + > drivers/mtd/nand/raw/nuvoton_ma35d1_nand.c | 935 +++++++++++++++++++++ > 3 files changed, 944 insertions(+) > create mode 100644 drivers/mtd/nand/raw/nuvoton_ma35d1_nand.c > > +#define SKIP_SPARE_BYTES 4 Unused, please drop. > +static int ma35_nfi_ecc_check(struct nand_chip *chip, unsigned long addr) > +{ > + struct ma35_nand_info *nand = nand_get_controller_data(chip); > + struct mtd_info *mtd = nand_to_mtd(chip); > + int status, i, j, nchunks = 0; status should be unsigned. > + int report_err = 0; > + int err_cnt = 0; > + > + nchunks = mtd->writesize / chip->ecc.steps; > + if (nchunks < 4) > + nchunks = 1; > + else > + nchunks /= 4; > + > + for (j = 0; j < nchunks; j++) { > + status = readl(nand->regs + MA35_NFI_REG_NANDECCES0 + j*4); > + if (!status) > + continue; > + > + for (i = 0; i < 4; i++) { > + if (!(status & ECC_STATUS_MASK)) { > + /* No error */ > + status >>= 8; > + continue; > + > + } else if ((status & ECC_STATUS_MASK) == 0x01) { > + /* Correctable error */ > + err_cnt = (status >> 2) & ECC_ERR_CNT_MASK; > + dev_warn(nand->dev, "nchunks (%d, %d) have %d error!\n", > + j, i, err_cnt); Correctable bitflips are expected. Please don't spam the log with it. > + ma35_nfi_correct(nand, j*4+i, err_cnt, (u8 *)addr); > + report_err += err_cnt; > + > + } else { > + /* uncorrectable error */ > + dev_warn(nand->dev, "uncorrectable error! 0x%4x\n", status); > + return -1; > + } > + status >>= 8; > + } > + } > + return report_err; There are a few things wrong here. Your chip->ecc.read_page op must return the maximum number of bitflips occured on a subpage while reading a page. To archieve this I suggest you fix the return value of this function accordingly and call it from chip->ecc.read_page rather than from the interrupt handler. Nevertheless mtd->ecc_stats.corrected counts the total number of bitflips, so you must handle this counter in this function. See rk_nfc_read_page_hwecc() as an example of a driver which gets it right. The background is that we have to rewrite the page once one ECC block hits a critical bitflip limit. A whole page might be fine when the bitflips are evenly distributed across the subpages, but it's not when all bitflips are occur in a single subpage. > +static int ma35_nand_do_write(struct nand_chip *chip, const u8 *addr, u32 len) > +{ > + struct ma35_nand_info *nand = nand_get_controller_data(chip); > + struct mtd_info *mtd = nand_to_mtd(chip); > + dma_addr_t dma_addr; > + int ret = 0, i; > + u32 val, reg; > + > + ma35_nand_target_enable(nand); > + > + if (len != mtd->writesize) { > + for (i = 0; i < len; i++) > + writel(addr[i], nand->regs + MA35_NFI_REG_NANDDATA); > + ma35_nand_target_disable(nand); > + return ret; > + } > + > + /* Check the DMA status before enabling the DMA */ > + ret = readl_poll_timeout(nand->regs + MA35_NFI_REG_DMACTL, val, > + !(val & DMA_BUSY), 50, HZ/2); > + if (ret) > + dev_warn(nand->dev, "dma busy\n"); > + > + /* Reinitial dmac */ > + ma35_nand_dmac_init(nand); The function name already says it and the comment doesn't offer any additional information. Please drop such comments. > + > + writel(mtd->oobsize, nand->regs + MA35_NFI_REG_NANDRACTL); > + > + /* setup and start DMA using dma_addr */ > + writel(INT_DMA, nand->regs + MA35_NFI_REG_NANDINTEN); > + /* To mark this page as dirty. */ > + reg = readl(nand->regs + MA35_NFI_REG_NANDRA0); > + if (reg & 0xffff0000) > + writel(reg & 0xffff, nand->regs + MA35_NFI_REG_NANDRA0); > + > + /* Fill dma_addr */ > + dma_addr = dma_map_single(nand->dev, (void *)addr, len, DMA_TO_DEVICE); > + dma_sync_single_for_device(nand->dev, dma_addr, len, DMA_TO_DEVICE); > + ret = dma_mapping_error(nand->dev, dma_addr); > + if (ret) { > + dev_err(nand->dev, "dma mapping error\n"); > + return -EINVAL; > + } Call dma_sync_single_for_device() after you have checked for an error with dma_mapping_error(). That said, I think calling dma_sync_single_for_device() after dma_map_single() is unnecessary. > + > + writel((unsigned long)dma_addr, nand->regs + MA35_NFI_REG_DMASA); > + writel(readl(nand->regs + MA35_NFI_REG_NANDCTL) | DMA_W_EN, > + nand->regs + MA35_NFI_REG_NANDCTL); > + ret = wait_for_completion_timeout(&nand->complete, msecs_to_jiffies(1000)); > + if (!ret) { > + dev_err(nand->dev, "write timeout\n"); > + ret = -ETIMEDOUT; > + } > + > + dma_unmap_single(nand->dev, dma_addr, len, DMA_TO_DEVICE); > + > + ma35_nand_target_disable(nand); > + > + return ret; > +} > + > +static int ma35_nand_do_read(struct nand_chip *chip, const u8 *addr, u32 len) The addr argument shouldn't be const. You are supposed to write to this buffer and you actually do so. > +{ > + struct ma35_nand_info *nand = nand_get_controller_data(chip); > + struct mtd_info *mtd = nand_to_mtd(chip); > + u8 *ptr = (u8 *)addr; > + dma_addr_t dma_addr; > + int ret = 0, i; > + u32 val; > + > + ma35_nand_target_enable(nand); > + > + if (len != mtd->writesize) { > + for (i = 0; i < len; i++) > + *(ptr+i) = (u8)readl(nand->regs + MA35_NFI_REG_NANDDATA); > + ma35_nand_target_disable(nand); > + return ret; Just return 0 here. It's easier to read than having to look up the initialization value. > + } > + > + /* Check the DMA status before enabling the DMA */ > + ret = readl_poll_timeout(nand->regs + MA35_NFI_REG_DMACTL, val, > + !(val & DMA_BUSY), 50, HZ/2); > + if (ret) > + dev_warn(nand->dev, "dma busy\n"); > + > + /* Reinitial dmac */ > + ma35_nand_dmac_init(nand); > + > + writel(mtd->oobsize, nand->regs + MA35_NFI_REG_NANDRACTL); > + > + /* setup and start DMA using dma_addr */ > + dma_addr = dma_map_single(nand->dev, (void *)addr, len, DMA_FROM_DEVICE); > + ret = dma_mapping_error(nand->dev, dma_addr); > + if (ret) { > + dev_err(nand->dev, "dma mapping error\n"); > + return -EINVAL; > + } > + nand->dma_buf = (u8 *)addr; > + nand->dma_addr = dma_addr; > + > + writel((unsigned long)dma_addr, nand->regs + MA35_NFI_REG_DMASA); > + writel(readl(nand->regs + MA35_NFI_REG_NANDCTL) | DMA_R_EN, > + nand->regs + MA35_NFI_REG_NANDCTL); > + ret = wait_for_completion_timeout(&nand->complete, msecs_to_jiffies(1000)); > + if (!ret) { > + dev_err(nand->dev, "read timeout\n"); > + ret = -ETIMEDOUT; > + } > + > + dma_sync_single_for_cpu(nand->dev, dma_addr, len, DMA_FROM_DEVICE); > + dma_unmap_single(nand->dev, dma_addr, len, DMA_FROM_DEVICE); No need to call dma_sync_single_for_cpu() before dma_unmap_single(). > + > + ma35_nand_target_disable(nand); > + > + return ret; > +} > + > + > +static int ma35_nand_write_page_hwecc(struct nand_chip *chip, const u8 *buf, > + int oob_required, int page) > +{ > + struct mtd_info *mtd = nand_to_mtd(chip); > + u8 *ecc_calc = chip->ecc.calc_buf; Make this a void * to get rid of the explicit casting below. > + > + ma35_clear_spare(chip, mtd->oobsize); > + ma35_write_spare(chip, mtd->oobsize - chip->ecc.total, (u32 *)chip->oob_poi); > + > + nand_prog_page_begin_op(chip, page, 0, buf, mtd->writesize); > + nand_prog_page_end_op(chip); > + > + /* Copy parity code in NANDRA to calc */ > + ma35_read_spare(chip, chip->ecc.total, (u32 *)ecc_calc, > + mtd->oobsize - chip->ecc.total); > + > + /* Copy parity code in calc to oob_poi */ > + memcpy((void *)(chip->oob_poi + (mtd->oobsize - chip->ecc.total)), > + (void *)ecc_calc, chip->ecc.total); > + > + return 0; > +} > + > +static irqreturn_t ma35_nand_irq(int irq, void *id) > +{ > + struct ma35_nand_info *nand = (struct ma35_nand_info *)id; > + struct mtd_info *mtd = nand_to_mtd(&nand->chip); > + int stat = 0; > + u32 isr; > + > + spin_lock(&nand->dma_lock); > + > + isr = readl(nand->regs + MA35_NFI_REG_NANDINTSTS); > + if (isr & INT_ECC) { > + dma_sync_single_for_cpu(nand->dev, nand->dma_addr, mtd->writesize, > + DMA_FROM_DEVICE); > + stat = ma35_nfi_ecc_check(&nand->chip, (unsigned long)nand->dma_buf); nand->dma_buf already is a pointer which you cast to unisgned long here and back to a pointer in ma35_nfi_ecc_check(). ma35_nfi_ecc_check() should take a poiner instead. > + if (stat < 0) { > + mtd->ecc_stats.failed++; > + writel(DMA_RST | DMA_EN, nand->regs + MA35_NFI_REG_DMACTL); > + writel(readl(nand->regs + MA35_NFI_REG_NANDCTL) | SWRST, > + nand->regs + MA35_NFI_REG_NANDCTL); > + } else if (stat > 0) { > + mtd->ecc_stats.corrected += stat; /* Add corrected bit count */ > + } > + writel(INT_ECC, nand->regs + MA35_NFI_REG_NANDINTSTS); > + } > + if (isr & INT_DMA) { > + writel(INT_DMA, nand->regs + MA35_NFI_REG_NANDINTSTS); > + complete(&nand->complete); > + } > + spin_unlock(&nand->dma_lock); > + > + return IRQ_HANDLED; > +} > + > +static int ma35_nfc_exec_op(struct nand_chip *chip, > + const struct nand_operation *op, > + bool check_only) > +{ > + struct ma35_nand_info *nand = nand_get_controller_data(chip); > + u32 i, reg; > + int ret = 0; > + > + if (check_only) > + return 0; > + > + ma35_nand_target_enable(nand); > + reg = readl(nand->regs + MA35_NFI_REG_NANDINTSTS); > + reg |= INT_RB0; > + writel(reg, nand->regs + MA35_NFI_REG_NANDINTSTS); > + > + for (i = 0; i < op->ninstrs; i++) { > + ret = ma35_nfc_exec_instr(chip, &op->instrs[i]); > + if (ret) > + break; > + } The way ma35_nand_target_[en|dis]able() is called looks inconsistent. This function calls ma35_nand_target_enable(), so I would expect that the corresponding ma35_nand_target_disable() should be called here as well. ma35_nand_do_read() is called from here which has its own call to ma35_nand_target_enable(), but it doesn't call ma35_nand_target_disable() from all of its return pathes. > + > + ret = devm_request_irq(&pdev->dev, nand->irq, ma35_nand_irq, > + IRQF_TRIGGER_HIGH, "ma35d1-nand", nand); > + if (ret) { > + dev_err(&pdev->dev, "failed to request NAND irq\n"); > + clk_disable_unprepare(nand->clk); You used devm_clk_get_enabled(), so this will be done automatically. > + return -ENXIO; > + } > + > + nand->chip.controller = &nand->controller; > + platform_set_drvdata(pdev, nand); > + > + chip->options |= NAND_NO_SUBPAGE_WRITE | NAND_USES_DMA; > + > + /* set default mode in case dt entry is missing */ > + chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST; > + > + chip->ecc.write_page = ma35_nand_write_page_hwecc; > + chip->ecc.read_page = ma35_nand_read_page_hwecc; > + chip->ecc.read_oob = ma35_nand_read_oob_hwecc; > + > + mtd = nand_to_mtd(chip); > + mtd->priv = chip; > + mtd->owner = THIS_MODULE; > + mtd->dev.parent = &pdev->dev; > + > + writel(NAND_EN, nand->regs + MA35_NFI_REG_GCTL); > + > + ret = nand_scan(chip, 1); > + if (ret) > + return ret; > + > + ret = mtd_device_register(mtd, NULL, 0); > + if (ret) { > + nand_cleanup(chip); > + devm_kfree(&pdev->dev, nand); Unnecessary free. Drop it. > + return ret; > + } > + > + return ret; > +} > + > +static void ma35_nand_remove(struct platform_device *pdev) > +{ > + struct ma35_nand_info *nand = platform_get_drvdata(pdev); > + int ret; > + > + devm_free_irq(&pdev->dev, nand->irq, nand); devm_ is a mechanism to let resources be freed automatically. There's normally no need to do this manually. Sascha -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

9 months, 2 weeks

Re: [PATCH v6 2/5] drm/panthor: record current and maximum device clock frequencies

by Steven Price

On 13/09/2024 13:42, Adrián Larumbe wrote: > In order to support UM in calculating rates of GPU utilisation, the current > operating and maximum GPU clock frequencies must be recorded during device > initialisation, and also during OPP state transitions. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> Reviewed-by: Steven Price <steven.price(a)arm.com> > --- > drivers/gpu/drm/panthor/panthor_devfreq.c | 18 +++++++++++++++++- > drivers/gpu/drm/panthor/panthor_device.h | 6 ++++++ > 2 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c > index c6d3c327cc24..9d0f891b9b53 100644 > --- a/drivers/gpu/drm/panthor/panthor_devfreq.c > +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c > @@ -62,14 +62,20 @@ static void panthor_devfreq_update_utilization(struct panthor_devfreq *pdevfreq) > static int panthor_devfreq_target(struct device *dev, unsigned long *freq, > u32 flags) > { > + struct panthor_device *ptdev = dev_get_drvdata(dev); > struct dev_pm_opp *opp; > + int err; > > opp = devfreq_recommended_opp(dev, freq, flags); > if (IS_ERR(opp)) > return PTR_ERR(opp); > dev_pm_opp_put(opp); > > - return dev_pm_opp_set_rate(dev, *freq); > + err = dev_pm_opp_set_rate(dev, *freq); > + if (!err) > + ptdev->current_frequency = *freq; > + > + return err; > } > > static void panthor_devfreq_reset(struct panthor_devfreq *pdevfreq) > @@ -130,6 +136,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > struct panthor_devfreq *pdevfreq; > struct dev_pm_opp *opp; > unsigned long cur_freq; > + unsigned long freq = ULONG_MAX; > int ret; > > pdevfreq = drmm_kzalloc(&ptdev->base, sizeof(*ptdev->devfreq), GFP_KERNEL); > @@ -161,6 +168,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > return PTR_ERR(opp); > > panthor_devfreq_profile.initial_freq = cur_freq; > + ptdev->current_frequency = cur_freq; > > /* Regulator coupling only takes care of synchronizing/balancing voltage > * updates, but the coupled regulator needs to be enabled manually. > @@ -204,6 +212,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > > dev_pm_opp_put(opp); > > + /* Find the fastest defined rate */ > + opp = dev_pm_opp_find_freq_floor(dev, &freq); > + if (IS_ERR(opp)) > + return PTR_ERR(opp); > + ptdev->fast_rate = freq; > + > + dev_pm_opp_put(opp); > + > /* > * Setup default thresholds for the simple_ondemand governor. > * The values are chosen based on experiments. > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index a48e30d0af30..2109905813e8 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -184,6 +184,12 @@ struct panthor_device { > > /** @profile_mask: User-set profiling flags for job accounting. */ > u32 profile_mask; > + > + /** @current_frequency: Device clock frequency at present. Set by DVFS*/ > + unsigned long current_frequency; > + > + /** @fast_rate: Maximum device clock frequency. Set by DVFS */ > + unsigned long fast_rate; > }; > > /**

9 months, 3 weeks

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 13/09/2024 13:42, Adrián Larumbe wrote: > Enable calculations of job submission times in clock cycles and wall > time. This is done by expanding the boilerplate command stream when running > a job to include instructions that compute said times right before and > after a user CS. > > A separate kernel BO is created per queue to store those values. Jobs can > access their sampled data through an index different from that of the > queue's ringbuffer. The reason for this is saving memory on the profiling > information kernel BO, since the amount of simultaneous profiled jobs we > can write into the queue's ringbuffer might be much smaller than for > regular jobs, as the former take more CSF instructions. > > This commit is done in preparation for enabling DRM fdinfo support in the > Panthor driver, which depends on the numbers calculated herein. > > A profile mode mask has been added that will in a future commit allow UM to > toggle performance metric sampling behaviour, which is disabled by default > to save power. When a ringbuffer CS is constructed, timestamp and cycling > sampling instructions are added depending on the enabled flags in the > profiling mask. > > A helper was provided that calculates the number of instructions for a > given set of enablement mask, and these are passed as the number of credits > when initialising a DRM scheduler job. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> > Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com> > Reviewed-by: Liviu Dudau <liviu.dudau(a)arm.com> Sorry I've been a bit slow about reviewing this. The kernel bot has pointed out a few minor issues and there's a few more below. > --- > drivers/gpu/drm/panthor/panthor_device.h | 22 ++ > drivers/gpu/drm/panthor/panthor_sched.c | 337 ++++++++++++++++++++--- > 2 files changed, 315 insertions(+), 44 deletions(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index e388c0472ba7..a48e30d0af30 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -66,6 +66,25 @@ struct panthor_irq { > atomic_t suspended; > }; > > +/** > + * enum panthor_device_profiling_mode - Profiling state > + */ > +enum panthor_device_profiling_flags { > + /** @PANTHOR_DEVICE_PROFILING_DISABLED: Profiling is disabled. */ > + PANTHOR_DEVICE_PROFILING_DISABLED = 0, > + > + /** @PANTHOR_DEVICE_PROFILING_CYCLES: Sampling job cycles. */ > + PANTHOR_DEVICE_PROFILING_CYCLES = BIT(0), > + > + /** @PANTHOR_DEVICE_PROFILING_TIMESTAMP: Sampling job timestamp. */ > + PANTHOR_DEVICE_PROFILING_TIMESTAMP = BIT(1), > + > + /** @PANTHOR_DEVICE_PROFILING_ALL: Sampling everything. */ > + PANTHOR_DEVICE_PROFILING_ALL = > + PANTHOR_DEVICE_PROFILING_CYCLES | > + PANTHOR_DEVICE_PROFILING_TIMESTAMP, > +}; > + > /** > * struct panthor_device - Panthor device > */ > @@ -162,6 +181,9 @@ struct panthor_device { > */ > struct page *dummy_latest_flush; > } pm; > + > + /** @profile_mask: User-set profiling flags for job accounting. */ > + u32 profile_mask; > }; > > /** > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index 42afdf0ddb7e..bcba52558f1e 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -93,6 +93,9 @@ > #define MIN_CSGS 3 > #define MAX_CSG_PRIO 0xf > > +#define NUM_INSTRS_PER_CACHE_LINE (64 / sizeof(u64)) > +#define MAX_INSTRS_PER_JOB 24 > + > struct panthor_group; > > /** > @@ -476,6 +479,18 @@ struct panthor_queue { > */ > struct list_head in_flight_jobs; > } fence_ctx; > + > + /** @profiling_info: Job profiling data slots and access information. */ kerneldoc name doesn't match: s/profiling_info/profiling/ > + struct { > + /** @slots: Kernel BO holding the slots. */ > + struct panthor_kernel_bo *slots; > + > + /** @slot_count: Number of jobs ringbuffer can hold at once. */ > + u32 slot_count; > + > + /** @profiling_seqno: Index of the next available profiling information slot. */ s/profiling_seqno/seqno/ > + u32 seqno; > + } profiling; > }; > > /** > @@ -661,6 +676,18 @@ struct panthor_group { > struct list_head wait_node; > }; > > +struct panthor_job_profiling_data { > + struct { > + u64 before; > + u64 after; > + } cycles; > + > + struct { > + u64 before; > + u64 after; > + } time; > +}; > + > /** > * group_queue_work() - Queue a group work > * @group: Group to queue the work for. > @@ -774,6 +801,15 @@ struct panthor_job { > > /** @done_fence: Fence signaled when the job is finished or cancelled. */ > struct dma_fence *done_fence; > + > + /** @profiling: Job profiling information. */ > + struct { > + /** @mask: Current device job profiling enablement bitmask. */ > + u32 mask; > + > + /** @slot: Job index in the profiling slots BO. */ > + u32 slot; > + } profiling; > }; > > static void > @@ -838,6 +874,7 @@ static void group_free_queue(struct panthor_group *group, struct panthor_queue * > > panthor_kernel_bo_destroy(queue->ringbuf); > panthor_kernel_bo_destroy(queue->iface.mem); > + panthor_kernel_bo_destroy(queue->profiling.slots); > > /* Release the last_fence we were holding, if any. */ > dma_fence_put(queue->fence_ctx.last_fence); > @@ -1982,8 +2019,6 @@ tick_ctx_init(struct panthor_scheduler *sched, > } > } > > -#define NUM_INSTRS_PER_SLOT 16 > - > static void > group_term_post_processing(struct panthor_group *group) > { > @@ -2815,65 +2850,211 @@ static void group_sync_upd_work(struct work_struct *work) > group_put(group); > } > > -static struct dma_fence * > -queue_run_job(struct drm_sched_job *sched_job) > +struct panthor_job_ringbuf_instrs { > + u64 buffer[MAX_INSTRS_PER_JOB]; > + u32 count; > +}; > + > +struct panthor_job_instr { > + u32 profile_mask; > + u64 instr; > +}; > + > +#define JOB_INSTR(__prof, __instr) \ > + { \ > + .profile_mask = __prof, \ > + .instr = __instr, \ > + } > + > +static void > +copy_instrs_to_ringbuf(struct panthor_queue *queue, > + struct panthor_job *job, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + ssize_t ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > + u32 start = job->ringbuf.start & (ringbuf_size - 1); > + ssize_t size, written; > + > + /* > + * We need to write a whole slot, including any trailing zeroes > + * that may come at the end of it. Also, because instrs.buffer has > + * been zero-initialised, there's no need to pad it with 0's > + */ > + instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + size = instrs->count * sizeof(u64); > + written = min(ringbuf_size - start, size); This causes a signedness error - I think the easiest thing is to just make size/written unsigned. You might also want to consider a WARN_ON(size > ringbuf_size) or similar to catch that (impossible) case which would cause the below logic to fail. > + > + memcpy(queue->ringbuf->kmap + start, instrs->buffer, written); > + > + if (written < size) > + memcpy(queue->ringbuf->kmap, > + &instrs->buffer[(ringbuf_size - start)/sizeof(u64)], ^^^^^^^^^^^^^^^^^^^^ I believe this is equal to 'written', so this can be rewritten as: + &instrs->buffer[written / sizeof(u64)], which I find clearer, especially since you've used 'written' just below. > + size - written); > +} > + > +struct panthor_job_cs_params { > + u32 profile_mask; > + u64 addr_reg; u64 val_reg; > + u64 cycle_reg; u64 time_reg; > + u64 sync_addr; u64 times_addr; > + u64 cs_start; u64 cs_size; > + u32 last_flush; u32 waitall_mask; > +}; > + > +static void > +get_job_cs_params(struct panthor_job *job, struct panthor_job_cs_params *params) > { > - struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > struct panthor_group *group = job->group; > struct panthor_queue *queue = group->queues[job->queue_idx]; > struct panthor_device *ptdev = group->ptdev; > struct panthor_scheduler *sched = ptdev->scheduler; > - u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > - u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1); > - u64 addr_reg = ptdev->csif_info.cs_reg_count - > - ptdev->csif_info.unpreserved_cs_reg_count; > - u64 val_reg = addr_reg + 2; > - u64 sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > - job->queue_idx * sizeof(struct panthor_syncobj_64b); > - u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > - struct dma_fence *done_fence; > - int ret; > > - u64 call_instrs[NUM_INSTRS_PER_SLOT] = { > + params->addr_reg = ptdev->csif_info.cs_reg_count - > + ptdev->csif_info.unpreserved_cs_reg_count; > + params->val_reg = params->addr_reg + 2; > + params->cycle_reg = params->addr_reg; > + params->time_reg = params->val_reg; > + > + params->sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > + job->queue_idx * sizeof(struct panthor_syncobj_64b); > + params->times_addr = panthor_kernel_bo_gpuva(queue->profiling.slots) + > + (job->profiling.slot * sizeof(struct panthor_job_profiling_data)); > + params->waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > + > + params->cs_start = job->call_info.start; > + params->cs_size = job->call_info.size; > + params->last_flush = job->call_info.latest_flush; > + > + params->profile_mask = job->profiling.mask; > +} > + > +static void > +prepare_job_instrs(const struct panthor_job_cs_params *params, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + const struct panthor_job_instr instr_seq[] = { > /* MOV32 rX+2, cs.latest_flush */ > - (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (2ull << 56) | (params->val_reg << 48) | params->last_flush), > > /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ > - (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233, > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (36ull << 56) | (0ull << 48) | (params->val_reg << 40) | (0 << 16) | 0x233), > + > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, cycles.before))), > + /* STORE_STATE cycles */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (1ull << 56) | (params->time_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, time.before))), > + /* STORE_STATE timer */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > > /* MOV48 rX:rX+1, cs.start */ > - (1ull << 56) | (addr_reg << 48) | job->call_info.start, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->addr_reg << 48) | params->cs_start), > /* MOV32 rX+2, cs.size */ > - (2ull << 56) | (val_reg << 48) | job->call_info.size, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (2ull << 56) | (params->val_reg << 48) | params->cs_size), > /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ > - (3ull << 56) | (1 << 16), > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (3ull << 56) | (1 << 16)), > /* CALL rX:rX+1, rX+2 */ > - (32ull << 56) | (addr_reg << 40) | (val_reg << 32), > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (32ull << 56) | (params->addr_reg << 40) | (params->val_reg << 32)), > + > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, cycles.after))), > + /* STORE_STATE cycles */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (1ull << 56) | (params->time_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, time.after))), > + /* STORE_STATE timer */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > > /* MOV48 rX:rX+1, sync_addr */ > - (1ull << 56) | (addr_reg << 48) | sync_addr, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->addr_reg << 48) | params->sync_addr), > /* MOV48 rX+2, #1 */ > - (1ull << 56) | (val_reg << 48) | 1, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->val_reg << 48) | 1), > /* WAIT(all) */ > - (3ull << 56) | (waitall_mask << 16), > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (3ull << 56) | (params->waitall_mask << 16)), > /* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/ > - (51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 32) | (0 << 16) | 1, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (51ull << 56) | (0ull << 48) | (params->addr_reg << 40) | > + (params->val_reg << 32) | (0 << 16) | 1), > /* ERROR_BARRIER, so we can recover from faults at job > * boundaries. > */ > - (47ull << 56), > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (47ull << 56)), Personally I think this would be easier to read if instead of using JOB_INSTR directly we define a few extra helper macros of the below form: #define JOB_INSTR_ALWAYS(instr) \ JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, instr) #define JOB_INSTR_TIMESTAMP(instr) \ JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, instr) #define JOB_INSTR_CYCLES(instr) \ JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, instr) In particular I think the ...PROFILING_DISABLED flag is somewhat confusing because actually means "always" not only when profiling is disabled. > + }; > + u32 pad; > + > + /* NEED to be cacheline aligned to please the prefetcher. */ > + static_assert(sizeof(instrs->buffer) % 64 == 0, > + "panthor_job_ringbuf_instrs::buffer is not aligned on a cacheline"); > + > + /* Make sure we have enough storage to store the whole sequence. */ > + static_assert(ALIGN(ARRAY_SIZE(instr_seq), NUM_INSTRS_PER_CACHE_LINE) == > + ARRAY_SIZE(instrs->buffer), > + "instr_seq vs panthor_job_ringbuf_instrs::buffer size mismatch"); > + > + for (u32 i = 0; i < ARRAY_SIZE(instr_seq); i++) { > + /* If the profile mask of this instruction is not enabled, skip it. */ > + if (instr_seq[i].profile_mask && > + !(instr_seq[i].profile_mask & params->profile_mask)) > + continue; > + > + instrs->buffer[instrs->count++] = instr_seq[i].instr; > + } > + > + pad = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + memset(&instrs->buffer[instrs->count], 0, > + (pad - instrs->count) * sizeof(instrs->buffer[0])); > + instrs->count = pad; > +} > + > +static u32 calc_job_credits(u32 profile_mask) > +{ > + struct panthor_job_ringbuf_instrs instrs; You need to initialize this as instrs.count is read by prepare_job_instrs(). > + struct panthor_job_cs_params params = { > + .profile_mask = profile_mask, > }; > > - /* Need to be cacheline aligned to please the prefetcher. */ > - static_assert(sizeof(call_instrs) % 64 == 0, > - "call_instrs is not aligned on a cacheline"); > + prepare_job_instrs(&params, &instrs); > + return instrs.count; > +} > + > +static struct dma_fence * > +queue_run_job(struct drm_sched_job *sched_job) > +{ > + struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > + struct panthor_group *group = job->group; > + struct panthor_queue *queue = group->queues[job->queue_idx]; > + struct panthor_device *ptdev = group->ptdev; > + struct panthor_scheduler *sched = ptdev->scheduler; > + struct panthor_job_ringbuf_instrs instrs; > + struct panthor_job_cs_params cs_params; > + struct dma_fence *done_fence; > + int ret; > > /* Stream size is zero, nothing to do except making sure all previously > * submitted jobs are done before we signal the > @@ -2900,17 +3081,23 @@ queue_run_job(struct drm_sched_job *sched_job) > queue->fence_ctx.id, > atomic64_inc_return(&queue->fence_ctx.seqno)); > > - memcpy(queue->ringbuf->kmap + ringbuf_insert, > - call_instrs, sizeof(call_instrs)); > + job->profiling.slot = queue->profiling.seqno++; > + if (queue->profiling.seqno == queue->profiling.slot_count) > + queue->profiling.seqno = 0; > + > + job->ringbuf.start = queue->iface.input->insert; > + > + get_job_cs_params(job, &cs_params); > + prepare_job_instrs(&cs_params, &instrs); > + copy_instrs_to_ringbuf(queue, job, &instrs); > + > + job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64)); > > panthor_job_get(&job->base); > spin_lock(&queue->fence_ctx.lock); > list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs); > spin_unlock(&queue->fence_ctx.lock); > > - job->ringbuf.start = queue->iface.input->insert; > - job->ringbuf.end = job->ringbuf.start + sizeof(call_instrs); > - > /* Make sure the ring buffer is updated before the INSERT > * register. > */ > @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { > .free_job = queue_free_job, > }; > > +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, > + u32 cs_ringbuf_size) > +{ > + u32 min_profiled_job_instrs = U32_MAX; > + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); > + > + /* > + * We want to calculate the minimum size of a profiled job's CS, > + * because since they need additional instructions for the sampling > + * of performance metrics, they might take up further slots in > + * the queue's ringbuffer. This means we might not need as many job > + * slots for keeping track of their profiling information. What we > + * need is the maximum number of slots we should allocate to this end, > + * which matches the maximum number of profiled jobs we can place > + * simultaneously in the queue's ring buffer. > + * That has to be calculated separately for every single job profiling > + * flag, but not in the case job profiling is disabled, since unprofiled > + * jobs don't need to keep track of this at all. > + */ > + for (u32 i = 0; i < last_flag; i++) { > + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) > + min_profiled_job_instrs = > + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); > + } > + > + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); > +} I may be missing something, but is there a situation where this is different to calc_job_credits(0)? AFAICT the infrastructure you've added can only add extra instructions to the no-flags case - whereas this implies you're thinking that instructions may also be removed (or replaced). Steve > + > static struct panthor_queue * > group_create_queue(struct panthor_group *group, > const struct drm_panthor_queue_create *args) > @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, > goto err_free_queue; > } > > + queue->profiling.slot_count = > + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); > + > + queue->profiling.slots = > + panthor_kernel_bo_create(group->ptdev, group->vm, > + queue->profiling.slot_count * > + sizeof(struct panthor_job_profiling_data), > + DRM_PANTHOR_BO_NO_MMAP, > + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | > + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, > + PANTHOR_VM_KERNEL_AUTO_VA); > + > + if (IS_ERR(queue->profiling.slots)) { > + ret = PTR_ERR(queue->profiling.slots); > + goto err_free_queue; > + } > + > + ret = panthor_kernel_bo_vmap(queue->profiling.slots); > + if (ret) > + goto err_free_queue; > + > + /* > + * Credit limit argument tells us the total number of instructions > + * across all CS slots in the ringbuffer, with some jobs requiring > + * twice as many as others, depending on their profiling status. > + */ > ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, > group->ptdev->scheduler->wq, 1, > - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), > + args->ringbuf_size / sizeof(u64), > 0, msecs_to_jiffies(JOB_TIMEOUT_MS), > group->ptdev->reset.wq, > NULL, "panthor-queue", group->ptdev->base.dev); > @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, > { > struct panthor_group_pool *gpool = pfile->groups; > struct panthor_job *job; > + u32 credits; > int ret; > > if (qsubmit->pad) > @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, > } > } > > + job->profiling.mask = pfile->ptdev->profile_mask; > + credits = calc_job_credits(job->profiling.mask); > + if (credits == 0) { > + ret = -EINVAL; > + goto err_put_job; > + } > + > ret = drm_sched_job_init(&job->base, > &job->group->queues[job->queue_idx]->entity, > - 1, job->group); > + credits, job->group); > if (ret) > goto err_put_job; >

9 months, 3 weeks

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by kernel test robot

Hi Adrián, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.11 next-20240913] [cannot apply to drm-misc/drm-misc-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: linus/master patch link: https://lore.kernel.org/r/20240913124857.389630-2-adrian.larumbe%40collabor… patch subject: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting config: i386-buildonly-randconfig-002-20240915 (https://download.01.org/0day-ci/archive/20240916/202409160050.m7VEd3pY-lkp@…) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240916/202409160050.m7VEd3pY-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409160050.m7VEd3pY-lkp@intel.com/ All errors (new ones prefixed by >>): >> ld.lld: error: call to __compiletime_assert_341 marked "dontcall-error": min(ringbuf_size - start, size) signedness error -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

9 months, 3 weeks

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by kernel test robot

Hi Adrián, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.11-rc7 next-20240913] [cannot apply to drm-misc/drm-misc-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: linus/master patch link: https://lore.kernel.org/r/20240913124857.389630-2-adrian.larumbe%40collabor… patch subject: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting config: i386-buildonly-randconfig-003-20240915 (https://download.01.org/0day-ci/archive/20240915/202409152243.r3t2jdOJ-lkp@…) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240915/202409152243.r3t2jdOJ-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409152243.r3t2jdOJ-lkp@intel.com/ All errors (new ones prefixed by >>): >> drivers/gpu/drm/panthor/panthor_sched.c:2885:12: error: call to '__compiletime_assert_371' declared with 'error' attribute: min(ringbuf_size - start, size) signedness error 2885 | written = min(ringbuf_size - start, size); | ^ include/linux/minmax.h:129:19: note: expanded from macro 'min' 129 | #define min(x, y) __careful_cmp(min, x, y) | ^ include/linux/minmax.h:105:2: note: expanded from macro '__careful_cmp' 105 | __careful_cmp_once(op, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_)) | ^ include/linux/minmax.h:100:2: note: expanded from macro '__careful_cmp_once' 100 | BUILD_BUG_ON_MSG(!__types_ok(x,y,ux,uy), \ | ^ note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) include/linux/compiler_types.h:498:2: note: expanded from macro '_compiletime_assert' 498 | __compiletime_assert(condition, msg, prefix, suffix) | ^ include/linux/compiler_types.h:491:4: note: expanded from macro '__compiletime_assert' 491 | prefix ## suffix(); \ | ^ <scratch space>:68:1: note: expanded from here 68 | __compiletime_assert_371 | ^ 1 error generated. vim +2885 drivers/gpu/drm/panthor/panthor_sched.c 2862 2863 #define JOB_INSTR(__prof, __instr) \ 2864 { \ 2865 .profile_mask = __prof, \ 2866 .instr = __instr, \ 2867 } 2868 2869 static void 2870 copy_instrs_to_ringbuf(struct panthor_queue *queue, 2871 struct panthor_job *job, 2872 struct panthor_job_ringbuf_instrs *instrs) 2873 { 2874 ssize_t ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); 2875 u32 start = job->ringbuf.start & (ringbuf_size - 1); 2876 ssize_t size, written; 2877 2878 /* 2879 * We need to write a whole slot, including any trailing zeroes 2880 * that may come at the end of it. Also, because instrs.buffer has 2881 * been zero-initialised, there's no need to pad it with 0's 2882 */ 2883 instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); 2884 size = instrs->count * sizeof(u64); > 2885 written = min(ringbuf_size - start, size); 2886 2887 memcpy(queue->ringbuf->kmap + start, instrs->buffer, written); 2888 2889 if (written < size) 2890 memcpy(queue->ringbuf->kmap, 2891 &instrs->buffer[(ringbuf_size - start)/sizeof(u64)], 2892 size - written); 2893 } 2894 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

9 months, 3 weeks

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by kernel test robot

Hi Adrián, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on v6.11-rc7 next-20240913] [cannot apply to drm-misc/drm-misc-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: linus/master patch link: https://lore.kernel.org/r/20240913124857.389630-2-adrian.larumbe%40collabor… patch subject: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20240914/202409140506.OBoqSiVk-lkp@…) compiler: alpha-linux-gcc (GCC) 13.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240914/202409140506.OBoqSiVk-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409140506.OBoqSiVk-lkp@intel.com/ All warnings (new ones prefixed by >>): drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'runnable' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'idle' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'waiting' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'has_ref' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'in_progress' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'stopped_groups' description in 'panthor_scheduler' >> drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Function parameter or struct member 'profiling' not described in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'mem' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input_fw_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output_fw_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gpu_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'ref' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gt' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'sync64' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'bo' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'offset' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'kmap' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'lock' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'id' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'seqno' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'last_fence' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'in_flight_jobs' description in 'panthor_queue' >> drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'profiling_info' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slots' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slot_count' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'profiling_seqno' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'start' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'size' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'latest_flush' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'start' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'end' description in 'panthor_job' >> drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'mask' description in 'panthor_job' >> drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'slot' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:1734: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_fw_events' drivers/gpu/drm/panthor/panthor_sched.c:1734: warning: Function parameter or struct member 'events' not described in 'panthor_sched_report_fw_events' drivers/gpu/drm/panthor/panthor_sched.c:2626: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_mmu_fault' vim +494 drivers/gpu/drm/panthor/panthor_sched.c de85488138247d Boris Brezillon 2024-02-29 397 de85488138247d Boris Brezillon 2024-02-29 398 /** @ringbuf: Command stream ring-buffer. */ de85488138247d Boris Brezillon 2024-02-29 399 struct panthor_kernel_bo *ringbuf; de85488138247d Boris Brezillon 2024-02-29 400 de85488138247d Boris Brezillon 2024-02-29 401 /** @iface: Firmware interface. */ de85488138247d Boris Brezillon 2024-02-29 402 struct { de85488138247d Boris Brezillon 2024-02-29 403 /** @mem: FW memory allocated for this interface. */ de85488138247d Boris Brezillon 2024-02-29 404 struct panthor_kernel_bo *mem; de85488138247d Boris Brezillon 2024-02-29 405 de85488138247d Boris Brezillon 2024-02-29 406 /** @input: Input interface. */ de85488138247d Boris Brezillon 2024-02-29 407 struct panthor_fw_ringbuf_input_iface *input; de85488138247d Boris Brezillon 2024-02-29 408 de85488138247d Boris Brezillon 2024-02-29 409 /** @output: Output interface. */ de85488138247d Boris Brezillon 2024-02-29 410 const struct panthor_fw_ringbuf_output_iface *output; de85488138247d Boris Brezillon 2024-02-29 411 de85488138247d Boris Brezillon 2024-02-29 412 /** @input_fw_va: FW virtual address of the input interface buffer. */ de85488138247d Boris Brezillon 2024-02-29 413 u32 input_fw_va; de85488138247d Boris Brezillon 2024-02-29 414 de85488138247d Boris Brezillon 2024-02-29 415 /** @output_fw_va: FW virtual address of the output interface buffer. */ de85488138247d Boris Brezillon 2024-02-29 416 u32 output_fw_va; de85488138247d Boris Brezillon 2024-02-29 417 } iface; de85488138247d Boris Brezillon 2024-02-29 418 de85488138247d Boris Brezillon 2024-02-29 419 /** de85488138247d Boris Brezillon 2024-02-29 420 * @syncwait: Stores information about the synchronization object this de85488138247d Boris Brezillon 2024-02-29 421 * queue is waiting on. de85488138247d Boris Brezillon 2024-02-29 422 */ de85488138247d Boris Brezillon 2024-02-29 423 struct { de85488138247d Boris Brezillon 2024-02-29 424 /** @gpu_va: GPU address of the synchronization object. */ de85488138247d Boris Brezillon 2024-02-29 425 u64 gpu_va; de85488138247d Boris Brezillon 2024-02-29 426 de85488138247d Boris Brezillon 2024-02-29 427 /** @ref: Reference value to compare against. */ de85488138247d Boris Brezillon 2024-02-29 428 u64 ref; de85488138247d Boris Brezillon 2024-02-29 429 de85488138247d Boris Brezillon 2024-02-29 430 /** @gt: True if this is a greater-than test. */ de85488138247d Boris Brezillon 2024-02-29 431 bool gt; de85488138247d Boris Brezillon 2024-02-29 432 de85488138247d Boris Brezillon 2024-02-29 433 /** @sync64: True if this is a 64-bit sync object. */ de85488138247d Boris Brezillon 2024-02-29 434 bool sync64; de85488138247d Boris Brezillon 2024-02-29 435 de85488138247d Boris Brezillon 2024-02-29 436 /** @bo: Buffer object holding the synchronization object. */ de85488138247d Boris Brezillon 2024-02-29 437 struct drm_gem_object *obj; de85488138247d Boris Brezillon 2024-02-29 438 de85488138247d Boris Brezillon 2024-02-29 439 /** @offset: Offset of the synchronization object inside @bo. */ de85488138247d Boris Brezillon 2024-02-29 440 u64 offset; de85488138247d Boris Brezillon 2024-02-29 441 de85488138247d Boris Brezillon 2024-02-29 442 /** de85488138247d Boris Brezillon 2024-02-29 443 * @kmap: Kernel mapping of the buffer object holding the de85488138247d Boris Brezillon 2024-02-29 444 * synchronization object. de85488138247d Boris Brezillon 2024-02-29 445 */ de85488138247d Boris Brezillon 2024-02-29 446 void *kmap; de85488138247d Boris Brezillon 2024-02-29 447 } syncwait; de85488138247d Boris Brezillon 2024-02-29 448 de85488138247d Boris Brezillon 2024-02-29 449 /** @fence_ctx: Fence context fields. */ de85488138247d Boris Brezillon 2024-02-29 450 struct { de85488138247d Boris Brezillon 2024-02-29 451 /** @lock: Used to protect access to all fences allocated by this context. */ de85488138247d Boris Brezillon 2024-02-29 452 spinlock_t lock; de85488138247d Boris Brezillon 2024-02-29 453 de85488138247d Boris Brezillon 2024-02-29 454 /** de85488138247d Boris Brezillon 2024-02-29 455 * @id: Fence context ID. de85488138247d Boris Brezillon 2024-02-29 456 * de85488138247d Boris Brezillon 2024-02-29 457 * Allocated with dma_fence_context_alloc(). de85488138247d Boris Brezillon 2024-02-29 458 */ de85488138247d Boris Brezillon 2024-02-29 459 u64 id; de85488138247d Boris Brezillon 2024-02-29 460 de85488138247d Boris Brezillon 2024-02-29 461 /** @seqno: Sequence number of the last initialized fence. */ de85488138247d Boris Brezillon 2024-02-29 462 atomic64_t seqno; de85488138247d Boris Brezillon 2024-02-29 463 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 464 /** 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 465 * @last_fence: Fence of the last submitted job. 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 466 * 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 467 * We return this fence when we get an empty command stream. 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 468 * This way, we are guaranteed that all earlier jobs have completed 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 469 * when drm_sched_job::s_fence::finished without having to feed 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 470 * the CS ring buffer with a dummy job that only signals the fence. 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 471 */ 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 472 struct dma_fence *last_fence; 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 473 de85488138247d Boris Brezillon 2024-02-29 474 /** de85488138247d Boris Brezillon 2024-02-29 475 * @in_flight_jobs: List containing all in-flight jobs. de85488138247d Boris Brezillon 2024-02-29 476 * de85488138247d Boris Brezillon 2024-02-29 477 * Used to keep track and signal panthor_job::done_fence when the de85488138247d Boris Brezillon 2024-02-29 478 * synchronization object attached to the queue is signaled. de85488138247d Boris Brezillon 2024-02-29 479 */ de85488138247d Boris Brezillon 2024-02-29 480 struct list_head in_flight_jobs; de85488138247d Boris Brezillon 2024-02-29 481 } fence_ctx; a706810cebb072 Adrián Larumbe 2024-09-13 482 a706810cebb072 Adrián Larumbe 2024-09-13 483 /** @profiling_info: Job profiling data slots and access information. */ a706810cebb072 Adrián Larumbe 2024-09-13 484 struct { a706810cebb072 Adrián Larumbe 2024-09-13 485 /** @slots: Kernel BO holding the slots. */ a706810cebb072 Adrián Larumbe 2024-09-13 486 struct panthor_kernel_bo *slots; a706810cebb072 Adrián Larumbe 2024-09-13 487 a706810cebb072 Adrián Larumbe 2024-09-13 488 /** @slot_count: Number of jobs ringbuffer can hold at once. */ a706810cebb072 Adrián Larumbe 2024-09-13 489 u32 slot_count; a706810cebb072 Adrián Larumbe 2024-09-13 490 a706810cebb072 Adrián Larumbe 2024-09-13 491 /** @profiling_seqno: Index of the next available profiling information slot. */ a706810cebb072 Adrián Larumbe 2024-09-13 492 u32 seqno; a706810cebb072 Adrián Larumbe 2024-09-13 493 } profiling; de85488138247d Boris Brezillon 2024-02-29 @494 }; de85488138247d Boris Brezillon 2024-02-29 495 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

9 months, 3 weeks

Re: (subset) [PATCH v7 00/16] Add audio support for the MediaTek Genio 350-evk board

by Mark Brown

On Mon, 22 Jul 2024 08:53:29 +0200, Alexandre Mergnat wrote: > This serie aim to add the following audio support for the Genio 350-evk: > - Playback > - 2ch Headset Jack (Earphone) > - 1ch Line-out Jack (Speaker) > - 8ch HDMI Tx > - Capture > - 1ch DMIC (On-board Digital Microphone) > - 1ch AMIC (On-board Analogic Microphone) > - 1ch Headset Jack (External Analogic Microphone) > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [01/16] ASoC: dt-bindings: mediatek,mt8365-afe: Add audio afe document commit: ceb3ca2876243e3ea02f78b3d488b1f2d734de49 [02/16] ASoC: dt-bindings: mediatek,mt8365-mt6357: Add audio sound card document commit: 76d80dcdd55f70b28930edb97b96ee375e1cce5a [03/16] dt-bindings: mfd: mediatek: Add codec property for MT6357 PMIC commit: 761cab667898d86c04867948f1b7aec1090be796 [04/16] ASoC: mediatek: mt8365: Add common header commit: 38c7c9ddc74033406461d64e541bbc8268e77f73 [05/16] ASoC: mediatek: mt8365: Add audio clock control support commit: ef307b40b7f0042d54f020bccb3e728ced292282 [06/16] ASoC: mediatek: mt8365: Add I2S DAI support commit: 402bbb13a195caa83b3279ebecdabfb11ddee084 [07/16] ASoC: mediatek: mt8365: Add ADDA DAI support commit: 7c58c88e524180e8439acdfc44872325e7f6d33d [08/16] ASoC: mediatek: mt8365: Add DMIC DAI support commit: 1c50ec75ce6c0c6b5736499393e522f73e19d0cf [09/16] ASoC: mediatek: mt8365: Add PCM DAI support commit: 5097c0c8634d703e3c59cfb89831b7db9dc46339 [10/16] ASoc: mediatek: mt8365: Add a specific soundcard for EVK commit: 1bf6dbd75f7603dd026660bebf324f812200dc1b [11/16] ASoC: mediatek: mt8365: Add the AFE driver support commit: e1991d102bc2abb32331c462f8f3e77059c69578 [12/16] ASoC: codecs: add MT6357 support (no commit info) [13/16] ASoC: mediatek: Add MT8365 support (no commit info) All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark

9 months, 3 weeks

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig