- Linaro-mm-sig - lists.linaro.org

Re: [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor

by Jani Nikula

On Wed, 02 Oct 2024, Boris Brezillon <boris.brezillon(a)collabora.com> wrote: > Queued to drm-misc-next after applying the few modifications I > mentioned. Also added Steve's ack (given on IRC) on the first patch. Can we have the drm-tip rebuild conflict resolution too, please? diff --cc drivers/gpu/drm/panthor/panthor_drv.c index c520f156e2d7,f9b93f84d611..000000000000 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@@ -1383,7 -1476,7 +1476,11 @@@ static const struct file_operations pan .read = drm_read, .llseek = noop_llseek, .mmap = panthor_mmap, ++<<<<<<< HEAD + .fop_flags = FOP_UNSIGNED_OFFSET, ++======= + .show_fdinfo = drm_show_fdinfo, ++>>>>>>> drm-misc/drm-misc-next }; #ifdef CONFIG_DEBUG_FS > >> >> .../testing/sysfs-driver-panthor-profiling | 10 + >> Documentation/gpu/panthor.rst | 46 +++ >> drivers/gpu/drm/panthor/panthor_devfreq.c | 18 +- >> drivers/gpu/drm/panthor/panthor_device.h | 36 ++ >> drivers/gpu/drm/panthor/panthor_drv.c | 73 ++++ >> drivers/gpu/drm/panthor/panthor_gem.c | 12 + >> drivers/gpu/drm/panthor/panthor_sched.c | 384 +++++++++++++++--- >> drivers/gpu/drm/panthor/panthor_sched.h | 2 + >> 8 files changed, 531 insertions(+), 50 deletions(-) >> create mode 100644 Documentation/ABI/testing/sysfs-driver-panthor-profiling >> create mode 100644 Documentation/gpu/panthor.rst >> > -- Jani Nikula, Intel

8 months, 1 week

1
0
0 0

Re: [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 02/10/2024 09:38, Boris Brezillon wrote: > On Tue, 24 Sep 2024 00:06:21 +0100 > Adrián Larumbe <adrian.larumbe(a)collabora.com> wrote: > >> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >> + u32 cs_ringbuf_size) >> +{ >> + u32 min_profiled_job_instrs = U32_MAX; >> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >> + >> + /* >> + * We want to calculate the minimum size of a profiled job's CS, >> + * because since they need additional instructions for the sampling >> + * of performance metrics, they might take up further slots in >> + * the queue's ringbuffer. This means we might not need as many job >> + * slots for keeping track of their profiling information. What we >> + * need is the maximum number of slots we should allocate to this end, >> + * which matches the maximum number of profiled jobs we can place >> + * simultaneously in the queue's ring buffer. >> + * That has to be calculated separately for every single job profiling >> + * flag, but not in the case job profiling is disabled, since unprofiled >> + * jobs don't need to keep track of this at all. >> + */ >> + for (u32 i = 0; i < last_flag; i++) { >> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) > > I'll get rid of this check when applying, as suggested by Steve. Steve, > with this modification do you want me to add your R-b? Yes, please do. Thanks, Steve > BTW, I've also fixed a bunch of checkpatch errors/warnings, so you > might want to run checkpatch --strict next time. > >> + min_profiled_job_instrs = >> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >> + } >> + >> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >> +}

8 months, 1 week

1
0
0 0

Re: Question about 'dma_resv_get_fences'

by Christian König

Hi, Am 30.09.24 um 21:38 schrieb Zichen Xie: > Dear Linux Developers for DMA BUFFER SHARING FRAMEWORK, > > We are curious about the function 'dma_resv_get_fences' here: > https://elixir.bootlin.com/linux/v6.11/source/drivers/dma-buf/dma-resv.c#L5…, > and the logic below: > ``` > dma_resv_for_each_fence_unlocked(&cursor, fence) { > > if (dma_resv_iter_is_restarted(&cursor)) { > struct dma_fence **new_fences; > unsigned int count; > > while (*num_fences) > dma_fence_put((*fences)[--(*num_fences)]); > > count = cursor.num_fences + 1; > > /* Eventually re-allocate the array */ > new_fences = krealloc_array(*fences, count, > sizeof(void *), > GFP_KERNEL); > if (count && !new_fences) { > kfree(*fences); > *fences = NULL; > *num_fences = 0; > dma_resv_iter_end(&cursor); > return -ENOMEM; > } > *fences = new_fences; > } > > (*fences)[(*num_fences)++] = dma_fence_get(fence); > } > ``` > The existing check 'if (count && !new_fences)' may fail if count==0, > and 'krealloc_array' with count==0 is an undefined behavior. The > realloc may fail and return a NULL pointer, leading to a NULL Pointer > Dereference in '(*fences)[(*num_fences)++] = dma_fence_get(fence);' You already answered the question yourself "count = cursor.num_fences + 1;". So count can never be 0. What could theoretically be possible is that num_fences overflows, but this value isn't userspace controllable and we would run into memory allocation failures long before that happened. But we could potentially remove this whole handling since if there are no fences in the dma_resv object we don't enter the loop in the first place. Regards, Christian. > > Please correct us if we miss some key prerequisites for this function! > Thank you very much!

8 months, 1 week

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 27/09/2024 15:53, Adrián Larumbe wrote: > On 25.09.2024 10:56, Steven Price wrote: >> On 23/09/2024 21:43, Adrián Larumbe wrote: >>> Hi Steve, >>> >>> On 23.09.2024 09:55, Steven Price wrote: >>>> On 20/09/2024 23:36, Adrián Larumbe wrote: >>>>> Hi Steve, thanks for the review. >>>> >>>> Hi Adrián, >>>> >>>>> I've applied all of your suggestions for the next patch series revision, so I'll >>>>> only be answering to your question about the calc_profiling_ringbuf_num_slots >>>>> function further down below. >>>>> >>>> >>>> [...] >>>> >>>>>>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { >>>>>>> .free_job = queue_free_job, >>>>>>> }; >>>>>>> >>>>>>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>>>>>> + u32 cs_ringbuf_size) >>>>>>> +{ >>>>>>> + u32 min_profiled_job_instrs = U32_MAX; >>>>>>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >>>>>>> + >>>>>>> + /* >>>>>>> + * We want to calculate the minimum size of a profiled job's CS, >>>>>>> + * because since they need additional instructions for the sampling >>>>>>> + * of performance metrics, they might take up further slots in >>>>>>> + * the queue's ringbuffer. This means we might not need as many job >>>>>>> + * slots for keeping track of their profiling information. What we >>>>>>> + * need is the maximum number of slots we should allocate to this end, >>>>>>> + * which matches the maximum number of profiled jobs we can place >>>>>>> + * simultaneously in the queue's ring buffer. >>>>>>> + * That has to be calculated separately for every single job profiling >>>>>>> + * flag, but not in the case job profiling is disabled, since unprofiled >>>>>>> + * jobs don't need to keep track of this at all. >>>>>>> + */ >>>>>>> + for (u32 i = 0; i < last_flag; i++) { >>>>>>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >>>>>>> + min_profiled_job_instrs = >>>>>>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>>>>>> + } >>>>>>> + >>>>>>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>>>>>> +} >>>>>> >>>>>> I may be missing something, but is there a situation where this is >>>>>> different to calc_job_credits(0)? AFAICT the infrastructure you've added >>>>>> can only add extra instructions to the no-flags case - whereas this >>>>>> implies you're thinking that instructions may also be removed (or replaced). >>>>>> >>>>>> Steve >>>>> >>>>> Since we create a separate kernel BO to hold the profiling information slot, we >>>>> need one that would be able to accomodate as many slots as the maximum number of >>>>> profiled jobs we can insert simultaneously into the queue's ring buffer. Because >>>>> profiled jobs always take more instructions than unprofiled ones, then we would >>>>> usually need fewer slots than the number of unprofiled jobs we could insert at >>>>> once in the ring buffer. >>>>> >>>>> Because we represent profiling metrics with a bit mask, then we need to test the >>>>> size of the CS for every single metric enabled in isolation, since enabling more >>>>> than one will always mean a bigger CS, and therefore fewer jobs tracked at once >>>>> in the queue's ring buffer. >>>>> >>>>> In our case, calling calc_job_credits(0) would simply tell us the number of >>>>> instructions we need for a normal job with no profiled features enabled, which >>>>> would always requiere less instructions than profiled ones, and therefore more >>>>> slots in the profiling info kernel BO. But we don't need to keep track of >>>>> profiling numbers for unprofiled jobs, so there's no point in calculating this >>>>> number. >>>>> >>>>> At first I was simply allocating a profiling info kernel BO as big as the number >>>>> of simultaneous unprofiled job slots in the ring queue, but Boris pointed out >>>>> that since queue ringbuffers can be as big as 2GiB, a lot of this memory would >>>>> be wasted, since profiled jobs always require more slots because they hold more >>>>> instructions, so fewer profiling slots in said kernel BO. >>>>> >>>>> The value of this approach will eventually manifest if we decided to keep track of >>>>> more profiling metrics, since this code won't have to change at all, other than >>>>> adding new profiling flags in the panthor_device_profiling_flags enum. >>>> >>>> Thanks for the detailed explanation. I think what I was missing is that >>>> the loop is checking each bit flag independently and *not* checking >>>> calc_job_credits(0). >>>> >>>> The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what >>>> confused me - that should be completely redundant. Or at least we need >>>> something more intelligent if we have profiling bits which are not >>>> mutually compatible. >>> >>> I thought of an alternative that would only test bits that are actually part of >>> the mask: >>> >>> static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>> u32 cs_ringbuf_size) >>> { >>> u32 min_profiled_job_instrs = U32_MAX; >>> u32 profiling_mask = PANTHOR_DEVICE_PROFILING_ALL; >>> >>> while (profiling_mask) { >>> u32 i = ffs(profiling_mask) - 1; >>> profiling_mask &= ~BIT(i); >>> min_profiled_job_instrs = >>> min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>> } >>> >>> return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>> } >>> >>> However, I don't think this would be more efficient, because ffs() is probably >>> fetching the first set bit by performing register shifts, and I guess this would >>> take somewhat longer than iterating over every single bit from the last one, >>> even if also matching them against the whole mask, just in case in future >>> additions of performance metrics we decide to leave some of the lower >>> significance bits untouched. >> >> Efficiency isn't very important here - we're not on a fast path, so it's >> more about ensuring the code is readable. I don't think the above is >> more readable then the original for loop. >> >>> Regarding your question about mutual compatibility, I don't think that is an >>> issue here, because we're testing bits in isolation. If in the future we find >>> out that some of the values we're profiling cannot be sampled at once, we can >>> add that logic to the sysfs knob handler, to make sure UM cannot set forbidden >>> profiling masks. >> >> My comment about compatibility is because in the original above you were >> calculating the top bit of PANTHOR_DEVICE_PROFILING_ALL: >> >>> u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >> >> then looping between 0 and that bit: >> >>> for (u32 i = 0; i < last_flag; i++) { >> >> So the test: >> >>> if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >> >> would only fail if PANTHOR_DEVICE_PROFILING_ALL had gaps in the bits >> that it set. The only reason I can think for that to be true in the >> future is if there is some sort of incompatibility - e.g. maybe there's >> an old and new way of doing some form of profiling with the old way >> being kept for backwards compatibility. But I suspect if/when that is >> required we'll need to revisit this function anyway. So that 'if' >> statement seems completely redundant (it's trivially always true). > > I think you're right about this. Would you be fine with the rest of the patch > as it is in revision 8 if I also deleted this bitmask check? Yes the rest of it looks fine. Thanks, Steve >> Steve >> >>>> I'm also not entirely sure that the amount of RAM saved is significant, >>>> but you've already written the code so we might as well have the saving ;) >>> >>> I think this was more evident before Boris suggested we reduce the basic slot >>> size to that of a single cache line, because then the minimum profiled job >>> might've taken twice as many ringbuffer slots as a nonprofiled one. In that >>> case, we would need a half as big BO for holding the sampled data (in case the >>> least size profiled job CS would extend over the 16 instruction boundary). >>> I still think this is a good idea so that in the future we don't need to worry >>> about adjusting the code that deals with preparing the right boilerplate CS, >>> since it'll only be a matter of adding new instructions inside prepare_job_instrs(). >>> >>>> Thanks, >>>> Steve >>>> >>>>> Regards, >>>>> Adrian >>>>> >>>>>>> + >>>>>>> static struct panthor_queue * >>>>>>> group_create_queue(struct panthor_group *group, >>>>>>> const struct drm_panthor_queue_create *args) >>>>>>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, >>>>>>> goto err_free_queue; >>>>>>> } >>>>>>> >>>>>>> + queue->profiling.slot_count = >>>>>>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); >>>>>>> + >>>>>>> + queue->profiling.slots = >>>>>>> + panthor_kernel_bo_create(group->ptdev, group->vm, >>>>>>> + queue->profiling.slot_count * >>>>>>> + sizeof(struct panthor_job_profiling_data), >>>>>>> + DRM_PANTHOR_BO_NO_MMAP, >>>>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | >>>>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, >>>>>>> + PANTHOR_VM_KERNEL_AUTO_VA); >>>>>>> + >>>>>>> + if (IS_ERR(queue->profiling.slots)) { >>>>>>> + ret = PTR_ERR(queue->profiling.slots); >>>>>>> + goto err_free_queue; >>>>>>> + } >>>>>>> + >>>>>>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots); >>>>>>> + if (ret) >>>>>>> + goto err_free_queue; >>>>>>> + >>>>>>> + /* >>>>>>> + * Credit limit argument tells us the total number of instructions >>>>>>> + * across all CS slots in the ringbuffer, with some jobs requiring >>>>>>> + * twice as many as others, depending on their profiling status. >>>>>>> + */ >>>>>>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, >>>>>>> group->ptdev->scheduler->wq, 1, >>>>>>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), >>>>>>> + args->ringbuf_size / sizeof(u64), >>>>>>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS), >>>>>>> group->ptdev->reset.wq, >>>>>>> NULL, "panthor-queue", group->ptdev->base.dev); >>>>>>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, >>>>>>> { >>>>>>> struct panthor_group_pool *gpool = pfile->groups; >>>>>>> struct panthor_job *job; >>>>>>> + u32 credits; >>>>>>> int ret; >>>>>>> >>>>>>> if (qsubmit->pad) >>>>>>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> + job->profiling.mask = pfile->ptdev->profile_mask; >>>>>>> + credits = calc_job_credits(job->profiling.mask); >>>>>>> + if (credits == 0) { >>>>>>> + ret = -EINVAL; >>>>>>> + goto err_put_job; >>>>>>> + } >>>>>>> + >>>>>>> ret = drm_sched_job_init(&job->base, >>>>>>> &job->group->queues[job->queue_idx]->entity, >>>>>>> - 1, job->group); >>>>>>> + credits, job->group); >>>>>>> if (ret) >>>>>>> goto err_put_job; >>>>>>> >>>>>

8 months, 1 week

1
0
0 0

[RFC PATCH 0/4] Linaro restricted heap

by Jens Wiklander

Hi, This patch set is based on top of Yong Wu's restricted heap patch set [1]. It's also a continuation on Olivier's Add dma-buf secure-heap patch set [2]. The Linaro restricted heap uses genalloc in the kernel to manage the heap carvout. This is a difference from the Mediatek restricted heap which relies on the secure world to manage the carveout. I've tried to adress the comments on [2], but [1] introduces changes so I'm afraid I've had to skip some comments. This can be tested on QEMU with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b prototype/sdp-v1 repo sync -j8 cd build make toolchains -j4 make all -j$(nproc) make run-only # login and at the prompt: xtest --sdp-basic https://optee.readthedocs.io/en/latest/building/prerequisites.html list dependencies needed to build the above. The tests are pretty basic, mostly checking that a Trusted Application in the secure world can access and manipulate the memory. Cheers, Jens [1] https://lore.kernel.org/dri-devel/20240515112308.10171-1-yong.wu@mediatek.c… [2] https://lore.kernel.org/lkml/20220805135330.970-1-olivier.masse@nxp.com/ Changes since Olivier's post [2]: * Based on Yong Wu's post [1] where much of dma-buf handling is done in the generic restricted heap * Simplifications and cleanup * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap support" * Replaced the word "secure" with "restricted" where applicable Etienne Carriere (1): tee: new ioctl to a register tee_shm from a dmabuf file descriptor Jens Wiklander (2): dma-buf: heaps: restricted_heap: add no_map attribute dma-buf: heaps: add Linaro restricted dmabuf heap support Olivier Masse (1): dt-bindings: reserved-memory: add linaro,restricted-heap .../linaro,restricted-heap.yaml | 56 ++++++ drivers/dma-buf/heaps/Kconfig | 10 ++ drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/restricted_heap.c | 17 +- drivers/dma-buf/heaps/restricted_heap.h | 2 + .../dma-buf/heaps/restricted_heap_linaro.c | 165 ++++++++++++++++++ drivers/tee/tee_core.c | 38 ++++ drivers/tee/tee_shm.c | 104 ++++++++++- include/linux/tee_drv.h | 11 ++ include/uapi/linux/tee.h | 29 +++ 10 files changed, 426 insertions(+), 7 deletions(-) create mode 100644 Documentation/devicetree/bindings/reserved-memory/linaro,restricted-heap.yaml create mode 100644 drivers/dma-buf/heaps/restricted_heap_linaro.c -- 2.34.1

8 months, 1 week

10
35
0 0

Re: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support

by kernel test robot

Hi Adrián, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.11 next-20240927] [cannot apply to drm-misc/drm-misc-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: linus/master patch link: https://lore.kernel.org/r/20240923230912.2207320-4-adrian.larumbe%40collabo… patch subject: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support config: arm-randconfig-002-20240929 (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@…) compiler: arm-linux-gnueabi-gcc (GCC) 14.1.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409291048.zLqDeqpO-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from include/linux/math64.h:6, from include/linux/time.h:6, from include/linux/stat.h:19, from include/linux/module.h:13, from drivers/gpu/drm/panthor/panthor_drv.c:7: drivers/gpu/drm/panthor/panthor_drv.c: In function 'panthor_gpu_show_fdinfo': >> drivers/gpu/drm/panthor/panthor_drv.c:1389:45: error: implicit declaration of function 'arch_timer_get_cntfrq' [-Wimplicit-function-declaration] 1389 | arch_timer_get_cntfrq())); | ^~~~~~~~~~~~~~~~~~~~~ include/linux/math.h:40:39: note: in definition of macro 'DIV_ROUND_DOWN_ULL' 40 | ({ unsigned long long _tmp = (ll); do_div(_tmp, d); _tmp; }) | ^~ drivers/gpu/drm/panthor/panthor_drv.c:1388:28: note: in expansion of macro 'DIV_ROUND_UP_ULL' 1388 | DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), | ^~~~~~~~~~~~~~~~ vim +/arch_timer_get_cntfrq +1389 drivers/gpu/drm/panthor/panthor_drv.c 1377 1378 static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev, 1379 struct panthor_file *pfile, 1380 struct drm_printer *p) 1381 { 1382 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_ALL) 1383 panthor_fdinfo_gather_group_samples(pfile); 1384 1385 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP) { 1386 #ifdef CONFIG_ARM_ARCH_TIMER 1387 drm_printf(p, "drm-engine-panthor:\t%llu ns\n", 1388 DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), > 1389 arch_timer_get_cntfrq())); 1390 #endif 1391 } 1392 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_CYCLES) 1393 drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles); 1394 1395 drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate); 1396 drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency); 1397 } 1398 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

8 months, 1 week

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 23/09/2024 21:43, Adrián Larumbe wrote: > Hi Steve, > > On 23.09.2024 09:55, Steven Price wrote: >> On 20/09/2024 23:36, Adrián Larumbe wrote: >>> Hi Steve, thanks for the review. >> >> Hi Adrián, >> >>> I've applied all of your suggestions for the next patch series revision, so I'll >>> only be answering to your question about the calc_profiling_ringbuf_num_slots >>> function further down below. >>> >> >> [...] >> >>>>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { >>>>> .free_job = queue_free_job, >>>>> }; >>>>> >>>>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>>>> + u32 cs_ringbuf_size) >>>>> +{ >>>>> + u32 min_profiled_job_instrs = U32_MAX; >>>>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >>>>> + >>>>> + /* >>>>> + * We want to calculate the minimum size of a profiled job's CS, >>>>> + * because since they need additional instructions for the sampling >>>>> + * of performance metrics, they might take up further slots in >>>>> + * the queue's ringbuffer. This means we might not need as many job >>>>> + * slots for keeping track of their profiling information. What we >>>>> + * need is the maximum number of slots we should allocate to this end, >>>>> + * which matches the maximum number of profiled jobs we can place >>>>> + * simultaneously in the queue's ring buffer. >>>>> + * That has to be calculated separately for every single job profiling >>>>> + * flag, but not in the case job profiling is disabled, since unprofiled >>>>> + * jobs don't need to keep track of this at all. >>>>> + */ >>>>> + for (u32 i = 0; i < last_flag; i++) { >>>>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >>>>> + min_profiled_job_instrs = >>>>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>>>> + } >>>>> + >>>>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>>>> +} >>>> >>>> I may be missing something, but is there a situation where this is >>>> different to calc_job_credits(0)? AFAICT the infrastructure you've added >>>> can only add extra instructions to the no-flags case - whereas this >>>> implies you're thinking that instructions may also be removed (or replaced). >>>> >>>> Steve >>> >>> Since we create a separate kernel BO to hold the profiling information slot, we >>> need one that would be able to accomodate as many slots as the maximum number of >>> profiled jobs we can insert simultaneously into the queue's ring buffer. Because >>> profiled jobs always take more instructions than unprofiled ones, then we would >>> usually need fewer slots than the number of unprofiled jobs we could insert at >>> once in the ring buffer. >>> >>> Because we represent profiling metrics with a bit mask, then we need to test the >>> size of the CS for every single metric enabled in isolation, since enabling more >>> than one will always mean a bigger CS, and therefore fewer jobs tracked at once >>> in the queue's ring buffer. >>> >>> In our case, calling calc_job_credits(0) would simply tell us the number of >>> instructions we need for a normal job with no profiled features enabled, which >>> would always requiere less instructions than profiled ones, and therefore more >>> slots in the profiling info kernel BO. But we don't need to keep track of >>> profiling numbers for unprofiled jobs, so there's no point in calculating this >>> number. >>> >>> At first I was simply allocating a profiling info kernel BO as big as the number >>> of simultaneous unprofiled job slots in the ring queue, but Boris pointed out >>> that since queue ringbuffers can be as big as 2GiB, a lot of this memory would >>> be wasted, since profiled jobs always require more slots because they hold more >>> instructions, so fewer profiling slots in said kernel BO. >>> >>> The value of this approach will eventually manifest if we decided to keep track of >>> more profiling metrics, since this code won't have to change at all, other than >>> adding new profiling flags in the panthor_device_profiling_flags enum. >> >> Thanks for the detailed explanation. I think what I was missing is that >> the loop is checking each bit flag independently and *not* checking >> calc_job_credits(0). >> >> The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what >> confused me - that should be completely redundant. Or at least we need >> something more intelligent if we have profiling bits which are not >> mutually compatible. > > I thought of an alternative that would only test bits that are actually part of > the mask: > > static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, > u32 cs_ringbuf_size) > { > u32 min_profiled_job_instrs = U32_MAX; > u32 profiling_mask = PANTHOR_DEVICE_PROFILING_ALL; > > while (profiling_mask) { > u32 i = ffs(profiling_mask) - 1; > profiling_mask &= ~BIT(i); > min_profiled_job_instrs = > min(min_profiled_job_instrs, calc_job_credits(BIT(i))); > } > > return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); > } > > However, I don't think this would be more efficient, because ffs() is probably > fetching the first set bit by performing register shifts, and I guess this would > take somewhat longer than iterating over every single bit from the last one, > even if also matching them against the whole mask, just in case in future > additions of performance metrics we decide to leave some of the lower > significance bits untouched. Efficiency isn't very important here - we're not on a fast path, so it's more about ensuring the code is readable. I don't think the above is more readable then the original for loop. > Regarding your question about mutual compatibility, I don't think that is an > issue here, because we're testing bits in isolation. If in the future we find > out that some of the values we're profiling cannot be sampled at once, we can > add that logic to the sysfs knob handler, to make sure UM cannot set forbidden > profiling masks. My comment about compatibility is because in the original above you were calculating the top bit of PANTHOR_DEVICE_PROFILING_ALL: > u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); then looping between 0 and that bit: > for (u32 i = 0; i < last_flag; i++) { So the test: > if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) would only fail if PANTHOR_DEVICE_PROFILING_ALL had gaps in the bits that it set. The only reason I can think for that to be true in the future is if there is some sort of incompatibility - e.g. maybe there's an old and new way of doing some form of profiling with the old way being kept for backwards compatibility. But I suspect if/when that is required we'll need to revisit this function anyway. So that 'if' statement seems completely redundant (it's trivially always true). Steve >> I'm also not entirely sure that the amount of RAM saved is significant, >> but you've already written the code so we might as well have the saving ;) > > I think this was more evident before Boris suggested we reduce the basic slot > size to that of a single cache line, because then the minimum profiled job > might've taken twice as many ringbuffer slots as a nonprofiled one. In that > case, we would need a half as big BO for holding the sampled data (in case the > least size profiled job CS would extend over the 16 instruction boundary). > I still think this is a good idea so that in the future we don't need to worry > about adjusting the code that deals with preparing the right boilerplate CS, > since it'll only be a matter of adding new instructions inside prepare_job_instrs(). > >> Thanks, >> Steve >> >>> Regards, >>> Adrian >>> >>>>> + >>>>> static struct panthor_queue * >>>>> group_create_queue(struct panthor_group *group, >>>>> const struct drm_panthor_queue_create *args) >>>>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, >>>>> goto err_free_queue; >>>>> } >>>>> >>>>> + queue->profiling.slot_count = >>>>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); >>>>> + >>>>> + queue->profiling.slots = >>>>> + panthor_kernel_bo_create(group->ptdev, group->vm, >>>>> + queue->profiling.slot_count * >>>>> + sizeof(struct panthor_job_profiling_data), >>>>> + DRM_PANTHOR_BO_NO_MMAP, >>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | >>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, >>>>> + PANTHOR_VM_KERNEL_AUTO_VA); >>>>> + >>>>> + if (IS_ERR(queue->profiling.slots)) { >>>>> + ret = PTR_ERR(queue->profiling.slots); >>>>> + goto err_free_queue; >>>>> + } >>>>> + >>>>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots); >>>>> + if (ret) >>>>> + goto err_free_queue; >>>>> + >>>>> + /* >>>>> + * Credit limit argument tells us the total number of instructions >>>>> + * across all CS slots in the ringbuffer, with some jobs requiring >>>>> + * twice as many as others, depending on their profiling status. >>>>> + */ >>>>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, >>>>> group->ptdev->scheduler->wq, 1, >>>>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), >>>>> + args->ringbuf_size / sizeof(u64), >>>>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS), >>>>> group->ptdev->reset.wq, >>>>> NULL, "panthor-queue", group->ptdev->base.dev); >>>>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, >>>>> { >>>>> struct panthor_group_pool *gpool = pfile->groups; >>>>> struct panthor_job *job; >>>>> + u32 credits; >>>>> int ret; >>>>> >>>>> if (qsubmit->pad) >>>>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, >>>>> } >>>>> } >>>>> >>>>> + job->profiling.mask = pfile->ptdev->profile_mask; >>>>> + credits = calc_job_credits(job->profiling.mask); >>>>> + if (credits == 0) { >>>>> + ret = -EINVAL; >>>>> + goto err_put_job; >>>>> + } >>>>> + >>>>> ret = drm_sched_job_init(&job->base, >>>>> &job->group->queues[job->queue_idx]->entity, >>>>> - 1, job->group); >>>>> + credits, job->group); >>>>> if (ret) >>>>> goto err_put_job; >>>>> >>> > > > Adrian Larumbe

8 months, 2 weeks

1
0
0 0

Re: [PATCH] dma-buf: Add syntax highlighting to code listings in the document

by Christian König

Nothing wrong with this, I just didn't had time to double check it myself and then forgotten about it. Going to push it to drm-misc-next. Regards, Christian. Am 23.09.24 um 11:22 schrieb Tommy Chiang: > Ping. > Please let me know if I'm doing something wrong. > > On Mon, Feb 19, 2024 at 11:00 AM Tommy Chiang <ototot(a)chromium.org> wrote: >> Kindly ping :) >> >> On Fri, Jan 19, 2024 at 11:33 AM Tommy Chiang <ototot(a)chromium.org> wrote: >>> This patch tries to improve the display of the code listing >>> on The Linux Kernel documentation website for dma-buf [1] . >>> >>> Originally, it appears that it was attempting to escape >>> the '*' character, but looks like it's not necessary (now), >>> so we are seeing something like '\*' on the webite. >>> >>> This patch removes these unnecessary backslashes and adds syntax >>> highlighting to improve the readability of the code listing. >>> >>> [1] https://docs.kernel.org/driver-api/dma-buf.html >>> >>> Signed-off-by: Tommy Chiang <ototot(a)chromium.org> >>> --- >>> drivers/dma-buf/dma-buf.c | 15 +++++++++------ >>> 1 file changed, 9 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c >>> index 8fe5aa67b167..e083a0ab06d7 100644 >>> --- a/drivers/dma-buf/dma-buf.c >>> +++ b/drivers/dma-buf/dma-buf.c >>> @@ -1282,10 +1282,12 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF); >>> * vmap interface is introduced. Note that on very old 32-bit architectures >>> * vmalloc space might be limited and result in vmap calls failing. >>> * >>> - * Interfaces:: >>> + * Interfaces: >>> * >>> - * void \*dma_buf_vmap(struct dma_buf \*dmabuf, struct iosys_map \*map) >>> - * void dma_buf_vunmap(struct dma_buf \*dmabuf, struct iosys_map \*map) >>> + * .. code-block:: c >>> + * >>> + * void *dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map) >>> + * void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map) >>> * >>> * The vmap call can fail if there is no vmap support in the exporter, or if >>> * it runs out of vmalloc space. Note that the dma-buf layer keeps a reference >>> @@ -1342,10 +1344,11 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF); >>> * enough, since adding interfaces to intercept pagefaults and allow pte >>> * shootdowns would increase the complexity quite a bit. >>> * >>> - * Interface:: >>> + * Interface: >>> + * >>> + * .. code-block:: c >>> * >>> - * int dma_buf_mmap(struct dma_buf \*, struct vm_area_struct \*, >>> - * unsigned long); >>> + * int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); >>> * >>> * If the importing subsystem simply provides a special-purpose mmap call to >>> * set up a mapping in userspace, calling do_mmap with &dma_buf.file will >>> -- >>> 2.43.0.381.gb435a96ce8-goog >>>

8 months, 2 weeks

1
0
0 0

Re: [PATCH v7 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 23/09/2024 11:18, Boris Brezillon wrote: > On Mon, 23 Sep 2024 10:07:14 +0100 > Steven Price <steven.price(a)arm.com> wrote: > >>> +static struct dma_fence * >>> +queue_run_job(struct drm_sched_job *sched_job) >>> +{ >>> + struct panthor_job *job = container_of(sched_job, struct panthor_job, base); >>> + struct panthor_group *group = job->group; >>> + struct panthor_queue *queue = group->queues[job->queue_idx]; >>> + struct panthor_device *ptdev = group->ptdev; >>> + struct panthor_scheduler *sched = ptdev->scheduler; >>> + struct panthor_job_ringbuf_instrs instrs; >> >> instrs isn't initialised... >> >>> + struct panthor_job_cs_params cs_params; >>> + struct dma_fence *done_fence; >>> + int ret; >>> >>> /* Stream size is zero, nothing to do except making sure all previously >>> * submitted jobs are done before we signal the >>> @@ -2900,17 +3062,23 @@ queue_run_job(struct drm_sched_job *sched_job) >>> queue->fence_ctx.id, >>> atomic64_inc_return(&queue->fence_ctx.seqno)); >>> >>> - memcpy(queue->ringbuf->kmap + ringbuf_insert, >>> - call_instrs, sizeof(call_instrs)); >>> + job->profiling.slot = queue->profiling.seqno++; >>> + if (queue->profiling.seqno == queue->profiling.slot_count) >>> + queue->profiling.seqno = 0; >>> + >>> + job->ringbuf.start = queue->iface.input->insert; >>> + >>> + get_job_cs_params(job, &cs_params); >>> + prepare_job_instrs(&cs_params, &instrs); >> >> ...but it's passed into prepare_job_instrs() which depends on >> instrs.count (same bug as was in calc_job_credits()) - sorry I didn't >> spot it last review. > > Hm, can't we initialize instr::count to zero in prepare_job_instrs() > instead? Indeed that would probably be better! I hadn't noticed there were two places in the previous review. Steve

8 months, 2 weeks

1
0
0 0

Re: [PATCH v7 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 21/09/2024 00:43, Adrián Larumbe wrote: > Enable calculations of job submission times in clock cycles and wall > time. This is done by expanding the boilerplate command stream when running > a job to include instructions that compute said times right before and > after a user CS. > > A separate kernel BO is created per queue to store those values. Jobs can > access their sampled data through an index different from that of the > queue's ringbuffer. The reason for this is saving memory on the profiling > information kernel BO, since the amount of simultaneous profiled jobs we > can write into the queue's ringbuffer might be much smaller than for > regular jobs, as the former take more CSF instructions. > > This commit is done in preparation for enabling DRM fdinfo support in the > Panthor driver, which depends on the numbers calculated herein. > > A profile mode mask has been added that will in a future commit allow UM to > toggle performance metric sampling behaviour, which is disabled by default > to save power. When a ringbuffer CS is constructed, timestamp and cycling > sampling instructions are added depending on the enabled flags in the > profiling mask. > > A helper was provided that calculates the number of instructions for a > given set of enablement mask, and these are passed as the number of credits > when initialising a DRM scheduler job. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> > Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com> > Reviewed-by: Liviu Dudau <liviu.dudau(a)arm.com> I think just one bug remaining - see below... > --- > drivers/gpu/drm/panthor/panthor_device.h | 22 ++ > drivers/gpu/drm/panthor/panthor_sched.c | 328 +++++++++++++++++++---- > 2 files changed, 301 insertions(+), 49 deletions(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index e388c0472ba7..a48e30d0af30 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -66,6 +66,25 @@ struct panthor_irq { > atomic_t suspended; > }; > > +/** > + * enum panthor_device_profiling_mode - Profiling state > + */ > +enum panthor_device_profiling_flags { > + /** @PANTHOR_DEVICE_PROFILING_DISABLED: Profiling is disabled. */ > + PANTHOR_DEVICE_PROFILING_DISABLED = 0, > + > + /** @PANTHOR_DEVICE_PROFILING_CYCLES: Sampling job cycles. */ > + PANTHOR_DEVICE_PROFILING_CYCLES = BIT(0), > + > + /** @PANTHOR_DEVICE_PROFILING_TIMESTAMP: Sampling job timestamp. */ > + PANTHOR_DEVICE_PROFILING_TIMESTAMP = BIT(1), > + > + /** @PANTHOR_DEVICE_PROFILING_ALL: Sampling everything. */ > + PANTHOR_DEVICE_PROFILING_ALL = > + PANTHOR_DEVICE_PROFILING_CYCLES | > + PANTHOR_DEVICE_PROFILING_TIMESTAMP, > +}; > + > /** > * struct panthor_device - Panthor device > */ > @@ -162,6 +181,9 @@ struct panthor_device { > */ > struct page *dummy_latest_flush; > } pm; > + > + /** @profile_mask: User-set profiling flags for job accounting. */ > + u32 profile_mask; > }; > > /** > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index 42afdf0ddb7e..6da5c3d0015e 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -93,6 +93,9 @@ > #define MIN_CSGS 3 > #define MAX_CSG_PRIO 0xf > > +#define NUM_INSTRS_PER_CACHE_LINE (64 / sizeof(u64)) > +#define MAX_INSTRS_PER_JOB 24 > + > struct panthor_group; > > /** > @@ -476,6 +479,18 @@ struct panthor_queue { > */ > struct list_head in_flight_jobs; > } fence_ctx; > + > + /** @profiling: Job profiling data slots and access information. */ > + struct { > + /** @slots: Kernel BO holding the slots. */ > + struct panthor_kernel_bo *slots; > + > + /** @slot_count: Number of jobs ringbuffer can hold at once. */ > + u32 slot_count; > + > + /** @seqno: Index of the next available profiling information slot. */ > + u32 seqno; > + } profiling; > }; > > /** > @@ -661,6 +676,18 @@ struct panthor_group { > struct list_head wait_node; > }; > > +struct panthor_job_profiling_data { > + struct { > + u64 before; > + u64 after; > + } cycles; > + > + struct { > + u64 before; > + u64 after; > + } time; > +}; > + > /** > * group_queue_work() - Queue a group work > * @group: Group to queue the work for. > @@ -774,6 +801,15 @@ struct panthor_job { > > /** @done_fence: Fence signaled when the job is finished or cancelled. */ > struct dma_fence *done_fence; > + > + /** @profiling: Job profiling information. */ > + struct { > + /** @mask: Current device job profiling enablement bitmask. */ > + u32 mask; > + > + /** @slot: Job index in the profiling slots BO. */ > + u32 slot; > + } profiling; > }; > > static void > @@ -838,6 +874,7 @@ static void group_free_queue(struct panthor_group *group, struct panthor_queue * > > panthor_kernel_bo_destroy(queue->ringbuf); > panthor_kernel_bo_destroy(queue->iface.mem); > + panthor_kernel_bo_destroy(queue->profiling.slots); > > /* Release the last_fence we were holding, if any. */ > dma_fence_put(queue->fence_ctx.last_fence); > @@ -1982,8 +2019,6 @@ tick_ctx_init(struct panthor_scheduler *sched, > } > } > > -#define NUM_INSTRS_PER_SLOT 16 > - > static void > group_term_post_processing(struct panthor_group *group) > { > @@ -2815,65 +2850,192 @@ static void group_sync_upd_work(struct work_struct *work) > group_put(group); > } > > -static struct dma_fence * > -queue_run_job(struct drm_sched_job *sched_job) > +struct panthor_job_ringbuf_instrs { > + u64 buffer[MAX_INSTRS_PER_JOB]; > + u32 count; > +}; > + > +struct panthor_job_instr { > + u32 profile_mask; > + u64 instr; > +}; > + > +#define JOB_INSTR(__prof, __instr) \ > + { \ > + .profile_mask = __prof, \ > + .instr = __instr, \ > + } > + > +static void > +copy_instrs_to_ringbuf(struct panthor_queue *queue, > + struct panthor_job *job, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + u64 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > + u64 start = job->ringbuf.start & (ringbuf_size - 1); > + u64 size, written; > + > + /* > + * We need to write a whole slot, including any trailing zeroes > + * that may come at the end of it. Also, because instrs.buffer has > + * been zero-initialised, there's no need to pad it with 0's > + */ > + instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + size = instrs->count * sizeof(u64); > + WARN_ON(size > ringbuf_size); > + written = min(ringbuf_size - start, size); > + > + memcpy(queue->ringbuf->kmap + start, instrs->buffer, written); > + > + if (written < size) > + memcpy(queue->ringbuf->kmap, > + &instrs->buffer[written/sizeof(u64)], > + size - written); > +} > + > +struct panthor_job_cs_params { > + u32 profile_mask; > + u64 addr_reg; u64 val_reg; > + u64 cycle_reg; u64 time_reg; > + u64 sync_addr; u64 times_addr; > + u64 cs_start; u64 cs_size; > + u32 last_flush; u32 waitall_mask; > +}; > + > +static void > +get_job_cs_params(struct panthor_job *job, struct panthor_job_cs_params *params) > { > - struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > struct panthor_group *group = job->group; > struct panthor_queue *queue = group->queues[job->queue_idx]; > struct panthor_device *ptdev = group->ptdev; > struct panthor_scheduler *sched = ptdev->scheduler; > - u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > - u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1); > - u64 addr_reg = ptdev->csif_info.cs_reg_count - > - ptdev->csif_info.unpreserved_cs_reg_count; > - u64 val_reg = addr_reg + 2; > - u64 sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > - job->queue_idx * sizeof(struct panthor_syncobj_64b); > - u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > - struct dma_fence *done_fence; > - int ret; > > - u64 call_instrs[NUM_INSTRS_PER_SLOT] = { > - /* MOV32 rX+2, cs.latest_flush */ > - (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, > + params->addr_reg = ptdev->csif_info.cs_reg_count - > + ptdev->csif_info.unpreserved_cs_reg_count; > + params->val_reg = params->addr_reg + 2; > + params->cycle_reg = params->addr_reg; > + params->time_reg = params->val_reg; > > - /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ > - (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233, > + params->sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > + job->queue_idx * sizeof(struct panthor_syncobj_64b); > + params->times_addr = panthor_kernel_bo_gpuva(queue->profiling.slots) + > + (job->profiling.slot * sizeof(struct panthor_job_profiling_data)); > + params->waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > > - /* MOV48 rX:rX+1, cs.start */ > - (1ull << 56) | (addr_reg << 48) | job->call_info.start, > + params->cs_start = job->call_info.start; > + params->cs_size = job->call_info.size; > + params->last_flush = job->call_info.latest_flush; > > - /* MOV32 rX+2, cs.size */ > - (2ull << 56) | (val_reg << 48) | job->call_info.size, > + params->profile_mask = job->profiling.mask; > +} > > - /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ > - (3ull << 56) | (1 << 16), > +#define JOB_INSTR_ALWAYS(instr) \ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (instr)) > +#define JOB_INSTR_TIMESTAMP(instr) \ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, (instr)) > +#define JOB_INSTR_CYCLES(instr) \ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, (instr)) > > +static void > +prepare_job_instrs(const struct panthor_job_cs_params *params, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + const struct panthor_job_instr instr_seq[] = { > + /* MOV32 rX+2, cs.latest_flush */ > + JOB_INSTR_ALWAYS((2ull << 56) | (params->val_reg << 48) | params->last_flush), > + /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ > + JOB_INSTR_ALWAYS((36ull << 56) | (0ull << 48) | (params->val_reg << 40) | (0 << 16) | 0x233), > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR_CYCLES((1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + offsetof(struct panthor_job_profiling_data, cycles.before))), > + /* STORE_STATE cycles */ > + JOB_INSTR_CYCLES((40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR_TIMESTAMP((1ull << 56) | (params->time_reg << 48) | (params->times_addr + > + offsetof(struct panthor_job_profiling_data, time.before))), > + /* STORE_STATE timer */ > + JOB_INSTR_TIMESTAMP((40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > + /* MOV48 rX:rX+1, cs.start */ > + JOB_INSTR_ALWAYS((1ull << 56) | (params->addr_reg << 48) | params->cs_start), > + /* MOV32 rX+2, cs.size */ > + JOB_INSTR_ALWAYS((2ull << 56) | (params->val_reg << 48) | params->cs_size), > + /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ > + JOB_INSTR_ALWAYS((3ull << 56) | (1 << 16)), > /* CALL rX:rX+1, rX+2 */ > - (32ull << 56) | (addr_reg << 40) | (val_reg << 32), > - > + JOB_INSTR_ALWAYS((32ull << 56) | (params->addr_reg << 40) | (params->val_reg << 32)), > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR_CYCLES((1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + offsetof(struct panthor_job_profiling_data, cycles.after))), > + /* STORE_STATE cycles */ > + JOB_INSTR_CYCLES((40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR_TIMESTAMP((1ull << 56) | (params->time_reg << 48) | > + (params->times_addr + offsetof(struct panthor_job_profiling_data, time.after))), > + /* STORE_STATE timer */ > + JOB_INSTR_TIMESTAMP((40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > /* MOV48 rX:rX+1, sync_addr */ > - (1ull << 56) | (addr_reg << 48) | sync_addr, > - > + JOB_INSTR_ALWAYS((1ull << 56) | (params->addr_reg << 48) | params->sync_addr), > /* MOV48 rX+2, #1 */ > - (1ull << 56) | (val_reg << 48) | 1, > - > + JOB_INSTR_ALWAYS((1ull << 56) | (params->val_reg << 48) | 1), > /* WAIT(all) */ > - (3ull << 56) | (waitall_mask << 16), > - > + JOB_INSTR_ALWAYS((3ull << 56) | (params->waitall_mask << 16)), > /* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/ > - (51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 32) | (0 << 16) | 1, > + JOB_INSTR_ALWAYS((51ull << 56) | (0ull << 48) | (params->addr_reg << 40) | > + (params->val_reg << 32) | (0 << 16) | 1), > + /* ERROR_BARRIER, so we can recover from faults at job boundaries. */ > + JOB_INSTR_ALWAYS((47ull << 56)), > + }; > + u32 pad; > > - /* ERROR_BARRIER, so we can recover from faults at job > - * boundaries. > - */ > - (47ull << 56), > + /* NEED to be cacheline aligned to please the prefetcher. */ > + static_assert(sizeof(instrs->buffer) % 64 == 0, > + "panthor_job_ringbuf_instrs::buffer is not aligned on a cacheline"); > + > + /* Make sure we have enough storage to store the whole sequence. */ > + static_assert(ALIGN(ARRAY_SIZE(instr_seq), NUM_INSTRS_PER_CACHE_LINE) == > + ARRAY_SIZE(instrs->buffer), > + "instr_seq vs panthor_job_ringbuf_instrs::buffer size mismatch"); > + > + for (u32 i = 0; i < ARRAY_SIZE(instr_seq); i++) { > + /* If the profile mask of this instruction is not enabled, skip it. */ > + if (instr_seq[i].profile_mask && > + !(instr_seq[i].profile_mask & params->profile_mask)) > + continue; > + > + instrs->buffer[instrs->count++] = instr_seq[i].instr; > + } > + > + pad = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + memset(&instrs->buffer[instrs->count], 0, > + (pad - instrs->count) * sizeof(instrs->buffer[0])); > + instrs->count = pad; > +} > + > +static u32 calc_job_credits(u32 profile_mask) > +{ > + struct panthor_job_ringbuf_instrs instrs = { > + .count = 0, > + }; > + struct panthor_job_cs_params params = { > + .profile_mask = profile_mask, > }; > > - /* Need to be cacheline aligned to please the prefetcher. */ > - static_assert(sizeof(call_instrs) % 64 == 0, > - "call_instrs is not aligned on a cacheline"); > + prepare_job_instrs(&params, &instrs); > + return instrs.count; > +} > + > +static struct dma_fence * > +queue_run_job(struct drm_sched_job *sched_job) > +{ > + struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > + struct panthor_group *group = job->group; > + struct panthor_queue *queue = group->queues[job->queue_idx]; > + struct panthor_device *ptdev = group->ptdev; > + struct panthor_scheduler *sched = ptdev->scheduler; > + struct panthor_job_ringbuf_instrs instrs; instrs isn't initialised... > + struct panthor_job_cs_params cs_params; > + struct dma_fence *done_fence; > + int ret; > > /* Stream size is zero, nothing to do except making sure all previously > * submitted jobs are done before we signal the > @@ -2900,17 +3062,23 @@ queue_run_job(struct drm_sched_job *sched_job) > queue->fence_ctx.id, > atomic64_inc_return(&queue->fence_ctx.seqno)); > > - memcpy(queue->ringbuf->kmap + ringbuf_insert, > - call_instrs, sizeof(call_instrs)); > + job->profiling.slot = queue->profiling.seqno++; > + if (queue->profiling.seqno == queue->profiling.slot_count) > + queue->profiling.seqno = 0; > + > + job->ringbuf.start = queue->iface.input->insert; > + > + get_job_cs_params(job, &cs_params); > + prepare_job_instrs(&cs_params, &instrs); ...but it's passed into prepare_job_instrs() which depends on instrs.count (same bug as was in calc_job_credits()) - sorry I didn't spot it last review. Initializing instrs makes everything work for me. I'm not sure quite what kernel configuration you are using but I wonder if you've got a 'hardening' option enabled which is causing the stack to be zero-initialised. It's worth turning it off for testing purposes ;) Steve > + copy_instrs_to_ringbuf(queue, job, &instrs); > + > + job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64)); > > panthor_job_get(&job->base); > spin_lock(&queue->fence_ctx.lock); > list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs); > spin_unlock(&queue->fence_ctx.lock); > > - job->ringbuf.start = queue->iface.input->insert; > - job->ringbuf.end = job->ringbuf.start + sizeof(call_instrs); > - > /* Make sure the ring buffer is updated before the INSERT > * register. > */ > @@ -3003,6 +3171,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { > .free_job = queue_free_job, > }; > > +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, > + u32 cs_ringbuf_size) > +{ > + u32 min_profiled_job_instrs = U32_MAX; > + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); > + > + /* > + * We want to calculate the minimum size of a profiled job's CS, > + * because since they need additional instructions for the sampling > + * of performance metrics, they might take up further slots in > + * the queue's ringbuffer. This means we might not need as many job > + * slots for keeping track of their profiling information. What we > + * need is the maximum number of slots we should allocate to this end, > + * which matches the maximum number of profiled jobs we can place > + * simultaneously in the queue's ring buffer. > + * That has to be calculated separately for every single job profiling > + * flag, but not in the case job profiling is disabled, since unprofiled > + * jobs don't need to keep track of this at all. > + */ > + for (u32 i = 0; i < last_flag; i++) { > + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) > + min_profiled_job_instrs = > + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); > + } > + > + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); > +} > + > static struct panthor_queue * > group_create_queue(struct panthor_group *group, > const struct drm_panthor_queue_create *args) > @@ -3056,9 +3252,35 @@ group_create_queue(struct panthor_group *group, > goto err_free_queue; > } > > + queue->profiling.slot_count = > + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); > + > + queue->profiling.slots = > + panthor_kernel_bo_create(group->ptdev, group->vm, > + queue->profiling.slot_count * > + sizeof(struct panthor_job_profiling_data), > + DRM_PANTHOR_BO_NO_MMAP, > + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | > + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, > + PANTHOR_VM_KERNEL_AUTO_VA); > + > + if (IS_ERR(queue->profiling.slots)) { > + ret = PTR_ERR(queue->profiling.slots); > + goto err_free_queue; > + } > + > + ret = panthor_kernel_bo_vmap(queue->profiling.slots); > + if (ret) > + goto err_free_queue; > + > + /* > + * Credit limit argument tells us the total number of instructions > + * across all CS slots in the ringbuffer, with some jobs requiring > + * twice as many as others, depending on their profiling status. > + */ > ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, > group->ptdev->scheduler->wq, 1, > - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), > + args->ringbuf_size / sizeof(u64), > 0, msecs_to_jiffies(JOB_TIMEOUT_MS), > group->ptdev->reset.wq, > NULL, "panthor-queue", group->ptdev->base.dev); > @@ -3354,6 +3576,7 @@ panthor_job_create(struct panthor_file *pfile, > { > struct panthor_group_pool *gpool = pfile->groups; > struct panthor_job *job; > + u32 credits; > int ret; > > if (qsubmit->pad) > @@ -3407,9 +3630,16 @@ panthor_job_create(struct panthor_file *pfile, > } > } > > + job->profiling.mask = pfile->ptdev->profile_mask; > + credits = calc_job_credits(job->profiling.mask); > + if (credits == 0) { > + ret = -EINVAL; > + goto err_put_job; > + } > + > ret = drm_sched_job_init(&job->base, > &job->group->queues[job->queue_idx]->entity, > - 1, job->group); > + credits, job->group); > if (ret) > goto err_put_job; >

8 months, 2 weeks

1
0
0 0

Re: [PATCH v7 2/5] drm/panthor: record current and maximum device clock frequencies

by Steven Price

On 21/09/2024 00:43, Adrián Larumbe wrote: > In order to support UM in calculating rates of GPU utilisation, the current > operating and maximum GPU clock frequencies must be recorded during device > initialisation, and also during OPP state transitions. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> I thought I gave my r-b on v6 and I can't actually see any change: Reviewed-by: Steven Price <steven.price(a)arm.com> > --- > drivers/gpu/drm/panthor/panthor_devfreq.c | 18 +++++++++++++++++- > drivers/gpu/drm/panthor/panthor_device.h | 6 ++++++ > 2 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c > index c6d3c327cc24..9d0f891b9b53 100644 > --- a/drivers/gpu/drm/panthor/panthor_devfreq.c > +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c > @@ -62,14 +62,20 @@ static void panthor_devfreq_update_utilization(struct panthor_devfreq *pdevfreq) > static int panthor_devfreq_target(struct device *dev, unsigned long *freq, > u32 flags) > { > + struct panthor_device *ptdev = dev_get_drvdata(dev); > struct dev_pm_opp *opp; > + int err; > > opp = devfreq_recommended_opp(dev, freq, flags); > if (IS_ERR(opp)) > return PTR_ERR(opp); > dev_pm_opp_put(opp); > > - return dev_pm_opp_set_rate(dev, *freq); > + err = dev_pm_opp_set_rate(dev, *freq); > + if (!err) > + ptdev->current_frequency = *freq; > + > + return err; > } > > static void panthor_devfreq_reset(struct panthor_devfreq *pdevfreq) > @@ -130,6 +136,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > struct panthor_devfreq *pdevfreq; > struct dev_pm_opp *opp; > unsigned long cur_freq; > + unsigned long freq = ULONG_MAX; > int ret; > > pdevfreq = drmm_kzalloc(&ptdev->base, sizeof(*ptdev->devfreq), GFP_KERNEL); > @@ -161,6 +168,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > return PTR_ERR(opp); > > panthor_devfreq_profile.initial_freq = cur_freq; > + ptdev->current_frequency = cur_freq; > > /* Regulator coupling only takes care of synchronizing/balancing voltage > * updates, but the coupled regulator needs to be enabled manually. > @@ -204,6 +212,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > > dev_pm_opp_put(opp); > > + /* Find the fastest defined rate */ > + opp = dev_pm_opp_find_freq_floor(dev, &freq); > + if (IS_ERR(opp)) > + return PTR_ERR(opp); > + ptdev->fast_rate = freq; > + > + dev_pm_opp_put(opp); > + > /* > * Setup default thresholds for the simple_ondemand governor. > * The values are chosen based on experiments. > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index a48e30d0af30..2109905813e8 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -184,6 +184,12 @@ struct panthor_device { > > /** @profile_mask: User-set profiling flags for job accounting. */ > u32 profile_mask; > + > + /** @current_frequency: Device clock frequency at present. Set by DVFS*/ > + unsigned long current_frequency; > + > + /** @fast_rate: Maximum device clock frequency. Set by DVFS */ > + unsigned long fast_rate; > }; > > /**

8 months, 2 weeks

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 20/09/2024 23:36, Adrián Larumbe wrote: > Hi Steve, thanks for the review. Hi Adrián, > I've applied all of your suggestions for the next patch series revision, so I'll > only be answering to your question about the calc_profiling_ringbuf_num_slots > function further down below. > [...] >>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { >>> .free_job = queue_free_job, >>> }; >>> >>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, >>> + u32 cs_ringbuf_size) >>> +{ >>> + u32 min_profiled_job_instrs = U32_MAX; >>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); >>> + >>> + /* >>> + * We want to calculate the minimum size of a profiled job's CS, >>> + * because since they need additional instructions for the sampling >>> + * of performance metrics, they might take up further slots in >>> + * the queue's ringbuffer. This means we might not need as many job >>> + * slots for keeping track of their profiling information. What we >>> + * need is the maximum number of slots we should allocate to this end, >>> + * which matches the maximum number of profiled jobs we can place >>> + * simultaneously in the queue's ring buffer. >>> + * That has to be calculated separately for every single job profiling >>> + * flag, but not in the case job profiling is disabled, since unprofiled >>> + * jobs don't need to keep track of this at all. >>> + */ >>> + for (u32 i = 0; i < last_flag; i++) { >>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) >>> + min_profiled_job_instrs = >>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); >>> + } >>> + >>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); >>> +} >> >> I may be missing something, but is there a situation where this is >> different to calc_job_credits(0)? AFAICT the infrastructure you've added >> can only add extra instructions to the no-flags case - whereas this >> implies you're thinking that instructions may also be removed (or replaced). >> >> Steve > > Since we create a separate kernel BO to hold the profiling information slot, we > need one that would be able to accomodate as many slots as the maximum number of > profiled jobs we can insert simultaneously into the queue's ring buffer. Because > profiled jobs always take more instructions than unprofiled ones, then we would > usually need fewer slots than the number of unprofiled jobs we could insert at > once in the ring buffer. > > Because we represent profiling metrics with a bit mask, then we need to test the > size of the CS for every single metric enabled in isolation, since enabling more > than one will always mean a bigger CS, and therefore fewer jobs tracked at once > in the queue's ring buffer. > > In our case, calling calc_job_credits(0) would simply tell us the number of > instructions we need for a normal job with no profiled features enabled, which > would always requiere less instructions than profiled ones, and therefore more > slots in the profiling info kernel BO. But we don't need to keep track of > profiling numbers for unprofiled jobs, so there's no point in calculating this > number. > > At first I was simply allocating a profiling info kernel BO as big as the number > of simultaneous unprofiled job slots in the ring queue, but Boris pointed out > that since queue ringbuffers can be as big as 2GiB, a lot of this memory would > be wasted, since profiled jobs always require more slots because they hold more > instructions, so fewer profiling slots in said kernel BO. > > The value of this approach will eventually manifest if we decided to keep track of > more profiling metrics, since this code won't have to change at all, other than > adding new profiling flags in the panthor_device_profiling_flags enum. Thanks for the detailed explanation. I think what I was missing is that the loop is checking each bit flag independently and *not* checking calc_job_credits(0). The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what confused me - that should be completely redundant. Or at least we need something more intelligent if we have profiling bits which are not mutually compatible. I'm also not entirely sure that the amount of RAM saved is significant, but you've already written the code so we might as well have the saving ;) Thanks, Steve > Regards, > Adrian > >>> + >>> static struct panthor_queue * >>> group_create_queue(struct panthor_group *group, >>> const struct drm_panthor_queue_create *args) >>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, >>> goto err_free_queue; >>> } >>> >>> + queue->profiling.slot_count = >>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); >>> + >>> + queue->profiling.slots = >>> + panthor_kernel_bo_create(group->ptdev, group->vm, >>> + queue->profiling.slot_count * >>> + sizeof(struct panthor_job_profiling_data), >>> + DRM_PANTHOR_BO_NO_MMAP, >>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | >>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, >>> + PANTHOR_VM_KERNEL_AUTO_VA); >>> + >>> + if (IS_ERR(queue->profiling.slots)) { >>> + ret = PTR_ERR(queue->profiling.slots); >>> + goto err_free_queue; >>> + } >>> + >>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots); >>> + if (ret) >>> + goto err_free_queue; >>> + >>> + /* >>> + * Credit limit argument tells us the total number of instructions >>> + * across all CS slots in the ringbuffer, with some jobs requiring >>> + * twice as many as others, depending on their profiling status. >>> + */ >>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, >>> group->ptdev->scheduler->wq, 1, >>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), >>> + args->ringbuf_size / sizeof(u64), >>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS), >>> group->ptdev->reset.wq, >>> NULL, "panthor-queue", group->ptdev->base.dev); >>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, >>> { >>> struct panthor_group_pool *gpool = pfile->groups; >>> struct panthor_job *job; >>> + u32 credits; >>> int ret; >>> >>> if (qsubmit->pad) >>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, >>> } >>> } >>> >>> + job->profiling.mask = pfile->ptdev->profile_mask; >>> + credits = calc_job_credits(job->profiling.mask); >>> + if (credits == 0) { >>> + ret = -EINVAL; >>> + goto err_put_job; >>> + } >>> + >>> ret = drm_sched_job_init(&job->base, >>> &job->group->queues[job->queue_idx]->entity, >>> - 1, job->group); >>> + credits, job->group); >>> if (ret) >>> goto err_put_job; >>> >

8 months, 2 weeks

1
0
0 0

[RFC PATCH] dma-buf/dma-fence: Use a successful read_trylock() annotation for dma_fence_begin_signalling()

by Thomas Hellström

Condsider the following call sequence: /* Upper layer */ dma_fence_begin_signalling(); lock(tainted_shared_lock); /* Driver callback */ dma_fence_begin_signalling(); ... The driver might here use a utility that is annotated as intended for the dma-fence signalling critical path. Now if the upper layer isn't correctly annotated yet for whatever reason, resulting in /* Upper layer */ lock(tainted_shared_lock); /* Driver callback */ dma_fence_begin_signalling(); We will receive a false lockdep locking order violation notification from dma_fence_begin_signalling(). However entering a dma-fence signalling critical section itself doesn't block and could not cause a deadlock. So use a successful read_trylock() annotation instead for dma_fence_begin_signalling(). That will make sure that the locking order is correctly registered in the first case, and doesn't register any locking order in the second case. The alternative is of course to make sure that the "Upper layer" is always correctly annotated. But experience shows that's not easily achievable in all cases. Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> --- drivers/dma-buf/dma-fence.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index f177c56269bb..17f632768ef9 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -308,8 +308,8 @@ bool dma_fence_begin_signalling(void) if (in_atomic()) return true; - /* ... and non-recursive readlock */ - lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_); + /* ... and non-recursive successful read_trylock */ + lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _RET_IP_); return false; } @@ -340,7 +340,7 @@ void __dma_fence_might_wait(void) lock_map_acquire(&dma_fence_lockdep_map); lock_map_release(&dma_fence_lockdep_map); if (tmp) - lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_); + lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _THIS_IP_); } #endif -- 2.39.2

8 months, 2 weeks

3
7
0 0

[PATCH 1/2] dma-buf/dma-fence: remove unnecessary callbacks

by Christian König

The fence_value_str and timeline_value_str callbacks were just an unnecessary abstraction in the SW sync implementation. The only caller of those callbacks already knew that the fence in questions is a timeline_fence. So print the values directly instead of using a redirection. Additional to that remove the implementations from virtgpu and vgem. As far as I can see those were never used in the first place. Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/sw_sync.c | 16 ---------------- drivers/dma-buf/sync_debug.c | 21 ++------------------- drivers/gpu/drm/vgem/vgem_fence.c | 15 --------------- drivers/gpu/drm/virtio/virtgpu_fence.c | 16 ---------------- include/linux/dma-fence.h | 21 --------------------- 5 files changed, 2 insertions(+), 87 deletions(-) diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index c353029789cf..f7ce4c6b8b8e 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -178,20 +178,6 @@ static bool timeline_fence_enable_signaling(struct dma_fence *fence) return true; } -static void timeline_fence_value_str(struct dma_fence *fence, - char *str, int size) -{ - snprintf(str, size, "%lld", fence->seqno); -} - -static void timeline_fence_timeline_value_str(struct dma_fence *fence, - char *str, int size) -{ - struct sync_timeline *parent = dma_fence_parent(fence); - - snprintf(str, size, "%d", parent->value); -} - static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) { struct sync_pt *pt = dma_fence_to_sync_pt(fence); @@ -214,8 +200,6 @@ static const struct dma_fence_ops timeline_fence_ops = { .enable_signaling = timeline_fence_enable_signaling, .signaled = timeline_fence_signaled, .release = timeline_fence_release, - .fence_value_str = timeline_fence_value_str, - .timeline_value_str = timeline_fence_timeline_value_str, .set_deadline = timeline_fence_set_deadline, }; diff --git a/drivers/dma-buf/sync_debug.c b/drivers/dma-buf/sync_debug.c index 237bce21d1e7..270daae7d89a 100644 --- a/drivers/dma-buf/sync_debug.c +++ b/drivers/dma-buf/sync_debug.c @@ -82,25 +82,8 @@ static void sync_print_fence(struct seq_file *s, seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec); } - if (fence->ops->timeline_value_str && - fence->ops->fence_value_str) { - char value[64]; - bool success; - - fence->ops->fence_value_str(fence, value, sizeof(value)); - success = strlen(value); - - if (success) { - seq_printf(s, ": %s", value); - - fence->ops->timeline_value_str(fence, value, - sizeof(value)); - - if (strlen(value)) - seq_printf(s, " / %s", value); - } - } - + seq_printf(s, ": %lld", fence->seqno); + seq_printf(s, " / %d", parent->value); seq_putc(s, '\n'); } diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c index e15754178395..5298d995faa7 100644 --- a/drivers/gpu/drm/vgem/vgem_fence.c +++ b/drivers/gpu/drm/vgem/vgem_fence.c @@ -53,25 +53,10 @@ static void vgem_fence_release(struct dma_fence *base) dma_fence_free(&fence->base); } -static void vgem_fence_value_str(struct dma_fence *fence, char *str, int size) -{ - snprintf(str, size, "%llu", fence->seqno); -} - -static void vgem_fence_timeline_value_str(struct dma_fence *fence, char *str, - int size) -{ - snprintf(str, size, "%llu", - dma_fence_is_signaled(fence) ? fence->seqno : 0); -} - static const struct dma_fence_ops vgem_fence_ops = { .get_driver_name = vgem_fence_get_driver_name, .get_timeline_name = vgem_fence_get_timeline_name, .release = vgem_fence_release, - - .fence_value_str = vgem_fence_value_str, - .timeline_value_str = vgem_fence_timeline_value_str, }; static void vgem_fence_timeout(struct timer_list *t) diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c b/drivers/gpu/drm/virtio/virtgpu_fence.c index f28357dbde35..44c1d8ef3c4d 100644 --- a/drivers/gpu/drm/virtio/virtgpu_fence.c +++ b/drivers/gpu/drm/virtio/virtgpu_fence.c @@ -49,26 +49,10 @@ static bool virtio_gpu_fence_signaled(struct dma_fence *f) return false; } -static void virtio_gpu_fence_value_str(struct dma_fence *f, char *str, int size) -{ - snprintf(str, size, "[%llu, %llu]", f->context, f->seqno); -} - -static void virtio_gpu_timeline_value_str(struct dma_fence *f, char *str, - int size) -{ - struct virtio_gpu_fence *fence = to_virtio_gpu_fence(f); - - snprintf(str, size, "%llu", - (u64)atomic64_read(&fence->drv->last_fence_id)); -} - static const struct dma_fence_ops virtio_gpu_fence_ops = { .get_driver_name = virtio_gpu_get_driver_name, .get_timeline_name = virtio_gpu_get_timeline_name, .signaled = virtio_gpu_fence_signaled, - .fence_value_str = virtio_gpu_fence_value_str, - .timeline_value_str = virtio_gpu_timeline_value_str, }; struct virtio_gpu_fence *virtio_gpu_fence_alloc(struct virtio_gpu_device *vgdev, diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index e7ad819962e3..cf91cae6e30f 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -238,27 +238,6 @@ struct dma_fence_ops { */ void (*release)(struct dma_fence *fence); - /** - * @fence_value_str: - * - * Callback to fill in free-form debug info specific to this fence, like - * the sequence number. - * - * This callback is optional. - */ - void (*fence_value_str)(struct dma_fence *fence, char *str, int size); - - /** - * @timeline_value_str: - * - * Fills in the current value of the timeline as a string, like the - * sequence number. Note that the specific fence passed to this function - * should not matter, drivers should only use it to look up the - * corresponding timeline structures. - */ - void (*timeline_value_str)(struct dma_fence *fence, - char *str, int size); - /** * @set_deadline: * -- 2.34.1

8 months, 3 weeks

1
1
0 0

Re: [PATCH v4 2/2] mtd: rawnand: nuvoton: add new driver for the Nuvoton MA35 SoC

by Sascha Hauer

Hi, The driver has a few minor whitespace issues, please run through checkpatch.pl to catch them. Some more things inline. On Wed, Sep 18, 2024 at 09:03:08AM +0000, Hui-Ping Chen wrote: > Nuvoton MA35 SoCs NAND Flash Interface Controller > supports 2kiB, 4kiB and 8kiB page size, and up to > 8-bit, 12-bit, and 24-bit hardware ECC calculation > circuit to protect data. > > Signed-off-by: Hui-Ping Chen <hpchen0nvt(a)gmail.com> > --- > drivers/mtd/nand/raw/Kconfig | 8 + > drivers/mtd/nand/raw/Makefile | 1 + > drivers/mtd/nand/raw/nuvoton_ma35d1_nand.c | 935 +++++++++++++++++++++ > 3 files changed, 944 insertions(+) > create mode 100644 drivers/mtd/nand/raw/nuvoton_ma35d1_nand.c > > +#define SKIP_SPARE_BYTES 4 Unused, please drop. > +static int ma35_nfi_ecc_check(struct nand_chip *chip, unsigned long addr) > +{ > + struct ma35_nand_info *nand = nand_get_controller_data(chip); > + struct mtd_info *mtd = nand_to_mtd(chip); > + int status, i, j, nchunks = 0; status should be unsigned. > + int report_err = 0; > + int err_cnt = 0; > + > + nchunks = mtd->writesize / chip->ecc.steps; > + if (nchunks < 4) > + nchunks = 1; > + else > + nchunks /= 4; > + > + for (j = 0; j < nchunks; j++) { > + status = readl(nand->regs + MA35_NFI_REG_NANDECCES0 + j*4); > + if (!status) > + continue; > + > + for (i = 0; i < 4; i++) { > + if (!(status & ECC_STATUS_MASK)) { > + /* No error */ > + status >>= 8; > + continue; > + > + } else if ((status & ECC_STATUS_MASK) == 0x01) { > + /* Correctable error */ > + err_cnt = (status >> 2) & ECC_ERR_CNT_MASK; > + dev_warn(nand->dev, "nchunks (%d, %d) have %d error!\n", > + j, i, err_cnt); Correctable bitflips are expected. Please don't spam the log with it. > + ma35_nfi_correct(nand, j*4+i, err_cnt, (u8 *)addr); > + report_err += err_cnt; > + > + } else { > + /* uncorrectable error */ > + dev_warn(nand->dev, "uncorrectable error! 0x%4x\n", status); > + return -1; > + } > + status >>= 8; > + } > + } > + return report_err; There are a few things wrong here. Your chip->ecc.read_page op must return the maximum number of bitflips occured on a subpage while reading a page. To archieve this I suggest you fix the return value of this function accordingly and call it from chip->ecc.read_page rather than from the interrupt handler. Nevertheless mtd->ecc_stats.corrected counts the total number of bitflips, so you must handle this counter in this function. See rk_nfc_read_page_hwecc() as an example of a driver which gets it right. The background is that we have to rewrite the page once one ECC block hits a critical bitflip limit. A whole page might be fine when the bitflips are evenly distributed across the subpages, but it's not when all bitflips are occur in a single subpage. > +static int ma35_nand_do_write(struct nand_chip *chip, const u8 *addr, u32 len) > +{ > + struct ma35_nand_info *nand = nand_get_controller_data(chip); > + struct mtd_info *mtd = nand_to_mtd(chip); > + dma_addr_t dma_addr; > + int ret = 0, i; > + u32 val, reg; > + > + ma35_nand_target_enable(nand); > + > + if (len != mtd->writesize) { > + for (i = 0; i < len; i++) > + writel(addr[i], nand->regs + MA35_NFI_REG_NANDDATA); > + ma35_nand_target_disable(nand); > + return ret; > + } > + > + /* Check the DMA status before enabling the DMA */ > + ret = readl_poll_timeout(nand->regs + MA35_NFI_REG_DMACTL, val, > + !(val & DMA_BUSY), 50, HZ/2); > + if (ret) > + dev_warn(nand->dev, "dma busy\n"); > + > + /* Reinitial dmac */ > + ma35_nand_dmac_init(nand); The function name already says it and the comment doesn't offer any additional information. Please drop such comments. > + > + writel(mtd->oobsize, nand->regs + MA35_NFI_REG_NANDRACTL); > + > + /* setup and start DMA using dma_addr */ > + writel(INT_DMA, nand->regs + MA35_NFI_REG_NANDINTEN); > + /* To mark this page as dirty. */ > + reg = readl(nand->regs + MA35_NFI_REG_NANDRA0); > + if (reg & 0xffff0000) > + writel(reg & 0xffff, nand->regs + MA35_NFI_REG_NANDRA0); > + > + /* Fill dma_addr */ > + dma_addr = dma_map_single(nand->dev, (void *)addr, len, DMA_TO_DEVICE); > + dma_sync_single_for_device(nand->dev, dma_addr, len, DMA_TO_DEVICE); > + ret = dma_mapping_error(nand->dev, dma_addr); > + if (ret) { > + dev_err(nand->dev, "dma mapping error\n"); > + return -EINVAL; > + } Call dma_sync_single_for_device() after you have checked for an error with dma_mapping_error(). That said, I think calling dma_sync_single_for_device() after dma_map_single() is unnecessary. > + > + writel((unsigned long)dma_addr, nand->regs + MA35_NFI_REG_DMASA); > + writel(readl(nand->regs + MA35_NFI_REG_NANDCTL) | DMA_W_EN, > + nand->regs + MA35_NFI_REG_NANDCTL); > + ret = wait_for_completion_timeout(&nand->complete, msecs_to_jiffies(1000)); > + if (!ret) { > + dev_err(nand->dev, "write timeout\n"); > + ret = -ETIMEDOUT; > + } > + > + dma_unmap_single(nand->dev, dma_addr, len, DMA_TO_DEVICE); > + > + ma35_nand_target_disable(nand); > + > + return ret; > +} > + > +static int ma35_nand_do_read(struct nand_chip *chip, const u8 *addr, u32 len) The addr argument shouldn't be const. You are supposed to write to this buffer and you actually do so. > +{ > + struct ma35_nand_info *nand = nand_get_controller_data(chip); > + struct mtd_info *mtd = nand_to_mtd(chip); > + u8 *ptr = (u8 *)addr; > + dma_addr_t dma_addr; > + int ret = 0, i; > + u32 val; > + > + ma35_nand_target_enable(nand); > + > + if (len != mtd->writesize) { > + for (i = 0; i < len; i++) > + *(ptr+i) = (u8)readl(nand->regs + MA35_NFI_REG_NANDDATA); > + ma35_nand_target_disable(nand); > + return ret; Just return 0 here. It's easier to read than having to look up the initialization value. > + } > + > + /* Check the DMA status before enabling the DMA */ > + ret = readl_poll_timeout(nand->regs + MA35_NFI_REG_DMACTL, val, > + !(val & DMA_BUSY), 50, HZ/2); > + if (ret) > + dev_warn(nand->dev, "dma busy\n"); > + > + /* Reinitial dmac */ > + ma35_nand_dmac_init(nand); > + > + writel(mtd->oobsize, nand->regs + MA35_NFI_REG_NANDRACTL); > + > + /* setup and start DMA using dma_addr */ > + dma_addr = dma_map_single(nand->dev, (void *)addr, len, DMA_FROM_DEVICE); > + ret = dma_mapping_error(nand->dev, dma_addr); > + if (ret) { > + dev_err(nand->dev, "dma mapping error\n"); > + return -EINVAL; > + } > + nand->dma_buf = (u8 *)addr; > + nand->dma_addr = dma_addr; > + > + writel((unsigned long)dma_addr, nand->regs + MA35_NFI_REG_DMASA); > + writel(readl(nand->regs + MA35_NFI_REG_NANDCTL) | DMA_R_EN, > + nand->regs + MA35_NFI_REG_NANDCTL); > + ret = wait_for_completion_timeout(&nand->complete, msecs_to_jiffies(1000)); > + if (!ret) { > + dev_err(nand->dev, "read timeout\n"); > + ret = -ETIMEDOUT; > + } > + > + dma_sync_single_for_cpu(nand->dev, dma_addr, len, DMA_FROM_DEVICE); > + dma_unmap_single(nand->dev, dma_addr, len, DMA_FROM_DEVICE); No need to call dma_sync_single_for_cpu() before dma_unmap_single(). > + > + ma35_nand_target_disable(nand); > + > + return ret; > +} > + > + > +static int ma35_nand_write_page_hwecc(struct nand_chip *chip, const u8 *buf, > + int oob_required, int page) > +{ > + struct mtd_info *mtd = nand_to_mtd(chip); > + u8 *ecc_calc = chip->ecc.calc_buf; Make this a void * to get rid of the explicit casting below. > + > + ma35_clear_spare(chip, mtd->oobsize); > + ma35_write_spare(chip, mtd->oobsize - chip->ecc.total, (u32 *)chip->oob_poi); > + > + nand_prog_page_begin_op(chip, page, 0, buf, mtd->writesize); > + nand_prog_page_end_op(chip); > + > + /* Copy parity code in NANDRA to calc */ > + ma35_read_spare(chip, chip->ecc.total, (u32 *)ecc_calc, > + mtd->oobsize - chip->ecc.total); > + > + /* Copy parity code in calc to oob_poi */ > + memcpy((void *)(chip->oob_poi + (mtd->oobsize - chip->ecc.total)), > + (void *)ecc_calc, chip->ecc.total); > + > + return 0; > +} > + > +static irqreturn_t ma35_nand_irq(int irq, void *id) > +{ > + struct ma35_nand_info *nand = (struct ma35_nand_info *)id; > + struct mtd_info *mtd = nand_to_mtd(&nand->chip); > + int stat = 0; > + u32 isr; > + > + spin_lock(&nand->dma_lock); > + > + isr = readl(nand->regs + MA35_NFI_REG_NANDINTSTS); > + if (isr & INT_ECC) { > + dma_sync_single_for_cpu(nand->dev, nand->dma_addr, mtd->writesize, > + DMA_FROM_DEVICE); > + stat = ma35_nfi_ecc_check(&nand->chip, (unsigned long)nand->dma_buf); nand->dma_buf already is a pointer which you cast to unisgned long here and back to a pointer in ma35_nfi_ecc_check(). ma35_nfi_ecc_check() should take a poiner instead. > + if (stat < 0) { > + mtd->ecc_stats.failed++; > + writel(DMA_RST | DMA_EN, nand->regs + MA35_NFI_REG_DMACTL); > + writel(readl(nand->regs + MA35_NFI_REG_NANDCTL) | SWRST, > + nand->regs + MA35_NFI_REG_NANDCTL); > + } else if (stat > 0) { > + mtd->ecc_stats.corrected += stat; /* Add corrected bit count */ > + } > + writel(INT_ECC, nand->regs + MA35_NFI_REG_NANDINTSTS); > + } > + if (isr & INT_DMA) { > + writel(INT_DMA, nand->regs + MA35_NFI_REG_NANDINTSTS); > + complete(&nand->complete); > + } > + spin_unlock(&nand->dma_lock); > + > + return IRQ_HANDLED; > +} > + > +static int ma35_nfc_exec_op(struct nand_chip *chip, > + const struct nand_operation *op, > + bool check_only) > +{ > + struct ma35_nand_info *nand = nand_get_controller_data(chip); > + u32 i, reg; > + int ret = 0; > + > + if (check_only) > + return 0; > + > + ma35_nand_target_enable(nand); > + reg = readl(nand->regs + MA35_NFI_REG_NANDINTSTS); > + reg |= INT_RB0; > + writel(reg, nand->regs + MA35_NFI_REG_NANDINTSTS); > + > + for (i = 0; i < op->ninstrs; i++) { > + ret = ma35_nfc_exec_instr(chip, &op->instrs[i]); > + if (ret) > + break; > + } The way ma35_nand_target_[en|dis]able() is called looks inconsistent. This function calls ma35_nand_target_enable(), so I would expect that the corresponding ma35_nand_target_disable() should be called here as well. ma35_nand_do_read() is called from here which has its own call to ma35_nand_target_enable(), but it doesn't call ma35_nand_target_disable() from all of its return pathes. > + > + ret = devm_request_irq(&pdev->dev, nand->irq, ma35_nand_irq, > + IRQF_TRIGGER_HIGH, "ma35d1-nand", nand); > + if (ret) { > + dev_err(&pdev->dev, "failed to request NAND irq\n"); > + clk_disable_unprepare(nand->clk); You used devm_clk_get_enabled(), so this will be done automatically. > + return -ENXIO; > + } > + > + nand->chip.controller = &nand->controller; > + platform_set_drvdata(pdev, nand); > + > + chip->options |= NAND_NO_SUBPAGE_WRITE | NAND_USES_DMA; > + > + /* set default mode in case dt entry is missing */ > + chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST; > + > + chip->ecc.write_page = ma35_nand_write_page_hwecc; > + chip->ecc.read_page = ma35_nand_read_page_hwecc; > + chip->ecc.read_oob = ma35_nand_read_oob_hwecc; > + > + mtd = nand_to_mtd(chip); > + mtd->priv = chip; > + mtd->owner = THIS_MODULE; > + mtd->dev.parent = &pdev->dev; > + > + writel(NAND_EN, nand->regs + MA35_NFI_REG_GCTL); > + > + ret = nand_scan(chip, 1); > + if (ret) > + return ret; > + > + ret = mtd_device_register(mtd, NULL, 0); > + if (ret) { > + nand_cleanup(chip); > + devm_kfree(&pdev->dev, nand); Unnecessary free. Drop it. > + return ret; > + } > + > + return ret; > +} > + > +static void ma35_nand_remove(struct platform_device *pdev) > +{ > + struct ma35_nand_info *nand = platform_get_drvdata(pdev); > + int ret; > + > + devm_free_irq(&pdev->dev, nand->irq, nand); devm_ is a mechanism to let resources be freed automatically. There's normally no need to do this manually. Sascha -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

8 months, 3 weeks

1
0
0 0

Re: [PATCH v6 2/5] drm/panthor: record current and maximum device clock frequencies

by Steven Price

On 13/09/2024 13:42, Adrián Larumbe wrote: > In order to support UM in calculating rates of GPU utilisation, the current > operating and maximum GPU clock frequencies must be recorded during device > initialisation, and also during OPP state transitions. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> Reviewed-by: Steven Price <steven.price(a)arm.com> > --- > drivers/gpu/drm/panthor/panthor_devfreq.c | 18 +++++++++++++++++- > drivers/gpu/drm/panthor/panthor_device.h | 6 ++++++ > 2 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c > index c6d3c327cc24..9d0f891b9b53 100644 > --- a/drivers/gpu/drm/panthor/panthor_devfreq.c > +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c > @@ -62,14 +62,20 @@ static void panthor_devfreq_update_utilization(struct panthor_devfreq *pdevfreq) > static int panthor_devfreq_target(struct device *dev, unsigned long *freq, > u32 flags) > { > + struct panthor_device *ptdev = dev_get_drvdata(dev); > struct dev_pm_opp *opp; > + int err; > > opp = devfreq_recommended_opp(dev, freq, flags); > if (IS_ERR(opp)) > return PTR_ERR(opp); > dev_pm_opp_put(opp); > > - return dev_pm_opp_set_rate(dev, *freq); > + err = dev_pm_opp_set_rate(dev, *freq); > + if (!err) > + ptdev->current_frequency = *freq; > + > + return err; > } > > static void panthor_devfreq_reset(struct panthor_devfreq *pdevfreq) > @@ -130,6 +136,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > struct panthor_devfreq *pdevfreq; > struct dev_pm_opp *opp; > unsigned long cur_freq; > + unsigned long freq = ULONG_MAX; > int ret; > > pdevfreq = drmm_kzalloc(&ptdev->base, sizeof(*ptdev->devfreq), GFP_KERNEL); > @@ -161,6 +168,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > return PTR_ERR(opp); > > panthor_devfreq_profile.initial_freq = cur_freq; > + ptdev->current_frequency = cur_freq; > > /* Regulator coupling only takes care of synchronizing/balancing voltage > * updates, but the coupled regulator needs to be enabled manually. > @@ -204,6 +212,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > > dev_pm_opp_put(opp); > > + /* Find the fastest defined rate */ > + opp = dev_pm_opp_find_freq_floor(dev, &freq); > + if (IS_ERR(opp)) > + return PTR_ERR(opp); > + ptdev->fast_rate = freq; > + > + dev_pm_opp_put(opp); > + > /* > * Setup default thresholds for the simple_ondemand governor. > * The values are chosen based on experiments. > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index a48e30d0af30..2109905813e8 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -184,6 +184,12 @@ struct panthor_device { > > /** @profile_mask: User-set profiling flags for job accounting. */ > u32 profile_mask; > + > + /** @current_frequency: Device clock frequency at present. Set by DVFS*/ > + unsigned long current_frequency; > + > + /** @fast_rate: Maximum device clock frequency. Set by DVFS */ > + unsigned long fast_rate; > }; > > /**

8 months, 3 weeks

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by Steven Price

On 13/09/2024 13:42, Adrián Larumbe wrote: > Enable calculations of job submission times in clock cycles and wall > time. This is done by expanding the boilerplate command stream when running > a job to include instructions that compute said times right before and > after a user CS. > > A separate kernel BO is created per queue to store those values. Jobs can > access their sampled data through an index different from that of the > queue's ringbuffer. The reason for this is saving memory on the profiling > information kernel BO, since the amount of simultaneous profiled jobs we > can write into the queue's ringbuffer might be much smaller than for > regular jobs, as the former take more CSF instructions. > > This commit is done in preparation for enabling DRM fdinfo support in the > Panthor driver, which depends on the numbers calculated herein. > > A profile mode mask has been added that will in a future commit allow UM to > toggle performance metric sampling behaviour, which is disabled by default > to save power. When a ringbuffer CS is constructed, timestamp and cycling > sampling instructions are added depending on the enabled flags in the > profiling mask. > > A helper was provided that calculates the number of instructions for a > given set of enablement mask, and these are passed as the number of credits > when initialising a DRM scheduler job. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> > Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com> > Reviewed-by: Liviu Dudau <liviu.dudau(a)arm.com> Sorry I've been a bit slow about reviewing this. The kernel bot has pointed out a few minor issues and there's a few more below. > --- > drivers/gpu/drm/panthor/panthor_device.h | 22 ++ > drivers/gpu/drm/panthor/panthor_sched.c | 337 ++++++++++++++++++++--- > 2 files changed, 315 insertions(+), 44 deletions(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index e388c0472ba7..a48e30d0af30 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -66,6 +66,25 @@ struct panthor_irq { > atomic_t suspended; > }; > > +/** > + * enum panthor_device_profiling_mode - Profiling state > + */ > +enum panthor_device_profiling_flags { > + /** @PANTHOR_DEVICE_PROFILING_DISABLED: Profiling is disabled. */ > + PANTHOR_DEVICE_PROFILING_DISABLED = 0, > + > + /** @PANTHOR_DEVICE_PROFILING_CYCLES: Sampling job cycles. */ > + PANTHOR_DEVICE_PROFILING_CYCLES = BIT(0), > + > + /** @PANTHOR_DEVICE_PROFILING_TIMESTAMP: Sampling job timestamp. */ > + PANTHOR_DEVICE_PROFILING_TIMESTAMP = BIT(1), > + > + /** @PANTHOR_DEVICE_PROFILING_ALL: Sampling everything. */ > + PANTHOR_DEVICE_PROFILING_ALL = > + PANTHOR_DEVICE_PROFILING_CYCLES | > + PANTHOR_DEVICE_PROFILING_TIMESTAMP, > +}; > + > /** > * struct panthor_device - Panthor device > */ > @@ -162,6 +181,9 @@ struct panthor_device { > */ > struct page *dummy_latest_flush; > } pm; > + > + /** @profile_mask: User-set profiling flags for job accounting. */ > + u32 profile_mask; > }; > > /** > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index 42afdf0ddb7e..bcba52558f1e 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -93,6 +93,9 @@ > #define MIN_CSGS 3 > #define MAX_CSG_PRIO 0xf > > +#define NUM_INSTRS_PER_CACHE_LINE (64 / sizeof(u64)) > +#define MAX_INSTRS_PER_JOB 24 > + > struct panthor_group; > > /** > @@ -476,6 +479,18 @@ struct panthor_queue { > */ > struct list_head in_flight_jobs; > } fence_ctx; > + > + /** @profiling_info: Job profiling data slots and access information. */ kerneldoc name doesn't match: s/profiling_info/profiling/ > + struct { > + /** @slots: Kernel BO holding the slots. */ > + struct panthor_kernel_bo *slots; > + > + /** @slot_count: Number of jobs ringbuffer can hold at once. */ > + u32 slot_count; > + > + /** @profiling_seqno: Index of the next available profiling information slot. */ s/profiling_seqno/seqno/ > + u32 seqno; > + } profiling; > }; > > /** > @@ -661,6 +676,18 @@ struct panthor_group { > struct list_head wait_node; > }; > > +struct panthor_job_profiling_data { > + struct { > + u64 before; > + u64 after; > + } cycles; > + > + struct { > + u64 before; > + u64 after; > + } time; > +}; > + > /** > * group_queue_work() - Queue a group work > * @group: Group to queue the work for. > @@ -774,6 +801,15 @@ struct panthor_job { > > /** @done_fence: Fence signaled when the job is finished or cancelled. */ > struct dma_fence *done_fence; > + > + /** @profiling: Job profiling information. */ > + struct { > + /** @mask: Current device job profiling enablement bitmask. */ > + u32 mask; > + > + /** @slot: Job index in the profiling slots BO. */ > + u32 slot; > + } profiling; > }; > > static void > @@ -838,6 +874,7 @@ static void group_free_queue(struct panthor_group *group, struct panthor_queue * > > panthor_kernel_bo_destroy(queue->ringbuf); > panthor_kernel_bo_destroy(queue->iface.mem); > + panthor_kernel_bo_destroy(queue->profiling.slots); > > /* Release the last_fence we were holding, if any. */ > dma_fence_put(queue->fence_ctx.last_fence); > @@ -1982,8 +2019,6 @@ tick_ctx_init(struct panthor_scheduler *sched, > } > } > > -#define NUM_INSTRS_PER_SLOT 16 > - > static void > group_term_post_processing(struct panthor_group *group) > { > @@ -2815,65 +2850,211 @@ static void group_sync_upd_work(struct work_struct *work) > group_put(group); > } > > -static struct dma_fence * > -queue_run_job(struct drm_sched_job *sched_job) > +struct panthor_job_ringbuf_instrs { > + u64 buffer[MAX_INSTRS_PER_JOB]; > + u32 count; > +}; > + > +struct panthor_job_instr { > + u32 profile_mask; > + u64 instr; > +}; > + > +#define JOB_INSTR(__prof, __instr) \ > + { \ > + .profile_mask = __prof, \ > + .instr = __instr, \ > + } > + > +static void > +copy_instrs_to_ringbuf(struct panthor_queue *queue, > + struct panthor_job *job, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + ssize_t ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > + u32 start = job->ringbuf.start & (ringbuf_size - 1); > + ssize_t size, written; > + > + /* > + * We need to write a whole slot, including any trailing zeroes > + * that may come at the end of it. Also, because instrs.buffer has > + * been zero-initialised, there's no need to pad it with 0's > + */ > + instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + size = instrs->count * sizeof(u64); > + written = min(ringbuf_size - start, size); This causes a signedness error - I think the easiest thing is to just make size/written unsigned. You might also want to consider a WARN_ON(size > ringbuf_size) or similar to catch that (impossible) case which would cause the below logic to fail. > + > + memcpy(queue->ringbuf->kmap + start, instrs->buffer, written); > + > + if (written < size) > + memcpy(queue->ringbuf->kmap, > + &instrs->buffer[(ringbuf_size - start)/sizeof(u64)], ^^^^^^^^^^^^^^^^^^^^ I believe this is equal to 'written', so this can be rewritten as: + &instrs->buffer[written / sizeof(u64)], which I find clearer, especially since you've used 'written' just below. > + size - written); > +} > + > +struct panthor_job_cs_params { > + u32 profile_mask; > + u64 addr_reg; u64 val_reg; > + u64 cycle_reg; u64 time_reg; > + u64 sync_addr; u64 times_addr; > + u64 cs_start; u64 cs_size; > + u32 last_flush; u32 waitall_mask; > +}; > + > +static void > +get_job_cs_params(struct panthor_job *job, struct panthor_job_cs_params *params) > { > - struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > struct panthor_group *group = job->group; > struct panthor_queue *queue = group->queues[job->queue_idx]; > struct panthor_device *ptdev = group->ptdev; > struct panthor_scheduler *sched = ptdev->scheduler; > - u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > - u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1); > - u64 addr_reg = ptdev->csif_info.cs_reg_count - > - ptdev->csif_info.unpreserved_cs_reg_count; > - u64 val_reg = addr_reg + 2; > - u64 sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > - job->queue_idx * sizeof(struct panthor_syncobj_64b); > - u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > - struct dma_fence *done_fence; > - int ret; > > - u64 call_instrs[NUM_INSTRS_PER_SLOT] = { > + params->addr_reg = ptdev->csif_info.cs_reg_count - > + ptdev->csif_info.unpreserved_cs_reg_count; > + params->val_reg = params->addr_reg + 2; > + params->cycle_reg = params->addr_reg; > + params->time_reg = params->val_reg; > + > + params->sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > + job->queue_idx * sizeof(struct panthor_syncobj_64b); > + params->times_addr = panthor_kernel_bo_gpuva(queue->profiling.slots) + > + (job->profiling.slot * sizeof(struct panthor_job_profiling_data)); > + params->waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > + > + params->cs_start = job->call_info.start; > + params->cs_size = job->call_info.size; > + params->last_flush = job->call_info.latest_flush; > + > + params->profile_mask = job->profiling.mask; > +} > + > +static void > +prepare_job_instrs(const struct panthor_job_cs_params *params, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + const struct panthor_job_instr instr_seq[] = { > /* MOV32 rX+2, cs.latest_flush */ > - (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (2ull << 56) | (params->val_reg << 48) | params->last_flush), > > /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ > - (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233, > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (36ull << 56) | (0ull << 48) | (params->val_reg << 40) | (0 << 16) | 0x233), > + > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, cycles.before))), > + /* STORE_STATE cycles */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (1ull << 56) | (params->time_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, time.before))), > + /* STORE_STATE timer */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > > /* MOV48 rX:rX+1, cs.start */ > - (1ull << 56) | (addr_reg << 48) | job->call_info.start, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->addr_reg << 48) | params->cs_start), > /* MOV32 rX+2, cs.size */ > - (2ull << 56) | (val_reg << 48) | job->call_info.size, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (2ull << 56) | (params->val_reg << 48) | params->cs_size), > /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ > - (3ull << 56) | (1 << 16), > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (3ull << 56) | (1 << 16)), > /* CALL rX:rX+1, rX+2 */ > - (32ull << 56) | (addr_reg << 40) | (val_reg << 32), > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (32ull << 56) | (params->addr_reg << 40) | (params->val_reg << 32)), > + > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, cycles.after))), > + /* STORE_STATE cycles */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (1ull << 56) | (params->time_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, time.after))), > + /* STORE_STATE timer */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > > /* MOV48 rX:rX+1, sync_addr */ > - (1ull << 56) | (addr_reg << 48) | sync_addr, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->addr_reg << 48) | params->sync_addr), > /* MOV48 rX+2, #1 */ > - (1ull << 56) | (val_reg << 48) | 1, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->val_reg << 48) | 1), > /* WAIT(all) */ > - (3ull << 56) | (waitall_mask << 16), > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (3ull << 56) | (params->waitall_mask << 16)), > /* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/ > - (51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 32) | (0 << 16) | 1, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (51ull << 56) | (0ull << 48) | (params->addr_reg << 40) | > + (params->val_reg << 32) | (0 << 16) | 1), > /* ERROR_BARRIER, so we can recover from faults at job > * boundaries. > */ > - (47ull << 56), > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (47ull << 56)), Personally I think this would be easier to read if instead of using JOB_INSTR directly we define a few extra helper macros of the below form: #define JOB_INSTR_ALWAYS(instr) \ JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, instr) #define JOB_INSTR_TIMESTAMP(instr) \ JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, instr) #define JOB_INSTR_CYCLES(instr) \ JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, instr) In particular I think the ...PROFILING_DISABLED flag is somewhat confusing because actually means "always" not only when profiling is disabled. > + }; > + u32 pad; > + > + /* NEED to be cacheline aligned to please the prefetcher. */ > + static_assert(sizeof(instrs->buffer) % 64 == 0, > + "panthor_job_ringbuf_instrs::buffer is not aligned on a cacheline"); > + > + /* Make sure we have enough storage to store the whole sequence. */ > + static_assert(ALIGN(ARRAY_SIZE(instr_seq), NUM_INSTRS_PER_CACHE_LINE) == > + ARRAY_SIZE(instrs->buffer), > + "instr_seq vs panthor_job_ringbuf_instrs::buffer size mismatch"); > + > + for (u32 i = 0; i < ARRAY_SIZE(instr_seq); i++) { > + /* If the profile mask of this instruction is not enabled, skip it. */ > + if (instr_seq[i].profile_mask && > + !(instr_seq[i].profile_mask & params->profile_mask)) > + continue; > + > + instrs->buffer[instrs->count++] = instr_seq[i].instr; > + } > + > + pad = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + memset(&instrs->buffer[instrs->count], 0, > + (pad - instrs->count) * sizeof(instrs->buffer[0])); > + instrs->count = pad; > +} > + > +static u32 calc_job_credits(u32 profile_mask) > +{ > + struct panthor_job_ringbuf_instrs instrs; You need to initialize this as instrs.count is read by prepare_job_instrs(). > + struct panthor_job_cs_params params = { > + .profile_mask = profile_mask, > }; > > - /* Need to be cacheline aligned to please the prefetcher. */ > - static_assert(sizeof(call_instrs) % 64 == 0, > - "call_instrs is not aligned on a cacheline"); > + prepare_job_instrs(&params, &instrs); > + return instrs.count; > +} > + > +static struct dma_fence * > +queue_run_job(struct drm_sched_job *sched_job) > +{ > + struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > + struct panthor_group *group = job->group; > + struct panthor_queue *queue = group->queues[job->queue_idx]; > + struct panthor_device *ptdev = group->ptdev; > + struct panthor_scheduler *sched = ptdev->scheduler; > + struct panthor_job_ringbuf_instrs instrs; > + struct panthor_job_cs_params cs_params; > + struct dma_fence *done_fence; > + int ret; > > /* Stream size is zero, nothing to do except making sure all previously > * submitted jobs are done before we signal the > @@ -2900,17 +3081,23 @@ queue_run_job(struct drm_sched_job *sched_job) > queue->fence_ctx.id, > atomic64_inc_return(&queue->fence_ctx.seqno)); > > - memcpy(queue->ringbuf->kmap + ringbuf_insert, > - call_instrs, sizeof(call_instrs)); > + job->profiling.slot = queue->profiling.seqno++; > + if (queue->profiling.seqno == queue->profiling.slot_count) > + queue->profiling.seqno = 0; > + > + job->ringbuf.start = queue->iface.input->insert; > + > + get_job_cs_params(job, &cs_params); > + prepare_job_instrs(&cs_params, &instrs); > + copy_instrs_to_ringbuf(queue, job, &instrs); > + > + job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64)); > > panthor_job_get(&job->base); > spin_lock(&queue->fence_ctx.lock); > list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs); > spin_unlock(&queue->fence_ctx.lock); > > - job->ringbuf.start = queue->iface.input->insert; > - job->ringbuf.end = job->ringbuf.start + sizeof(call_instrs); > - > /* Make sure the ring buffer is updated before the INSERT > * register. > */ > @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { > .free_job = queue_free_job, > }; > > +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, > + u32 cs_ringbuf_size) > +{ > + u32 min_profiled_job_instrs = U32_MAX; > + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); > + > + /* > + * We want to calculate the minimum size of a profiled job's CS, > + * because since they need additional instructions for the sampling > + * of performance metrics, they might take up further slots in > + * the queue's ringbuffer. This means we might not need as many job > + * slots for keeping track of their profiling information. What we > + * need is the maximum number of slots we should allocate to this end, > + * which matches the maximum number of profiled jobs we can place > + * simultaneously in the queue's ring buffer. > + * That has to be calculated separately for every single job profiling > + * flag, but not in the case job profiling is disabled, since unprofiled > + * jobs don't need to keep track of this at all. > + */ > + for (u32 i = 0; i < last_flag; i++) { > + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) > + min_profiled_job_instrs = > + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); > + } > + > + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); > +} I may be missing something, but is there a situation where this is different to calc_job_credits(0)? AFAICT the infrastructure you've added can only add extra instructions to the no-flags case - whereas this implies you're thinking that instructions may also be removed (or replaced). Steve > + > static struct panthor_queue * > group_create_queue(struct panthor_group *group, > const struct drm_panthor_queue_create *args) > @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group, > goto err_free_queue; > } > > + queue->profiling.slot_count = > + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); > + > + queue->profiling.slots = > + panthor_kernel_bo_create(group->ptdev, group->vm, > + queue->profiling.slot_count * > + sizeof(struct panthor_job_profiling_data), > + DRM_PANTHOR_BO_NO_MMAP, > + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | > + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, > + PANTHOR_VM_KERNEL_AUTO_VA); > + > + if (IS_ERR(queue->profiling.slots)) { > + ret = PTR_ERR(queue->profiling.slots); > + goto err_free_queue; > + } > + > + ret = panthor_kernel_bo_vmap(queue->profiling.slots); > + if (ret) > + goto err_free_queue; > + > + /* > + * Credit limit argument tells us the total number of instructions > + * across all CS slots in the ringbuffer, with some jobs requiring > + * twice as many as others, depending on their profiling status. > + */ > ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, > group->ptdev->scheduler->wq, 1, > - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), > + args->ringbuf_size / sizeof(u64), > 0, msecs_to_jiffies(JOB_TIMEOUT_MS), > group->ptdev->reset.wq, > NULL, "panthor-queue", group->ptdev->base.dev); > @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile, > { > struct panthor_group_pool *gpool = pfile->groups; > struct panthor_job *job; > + u32 credits; > int ret; > > if (qsubmit->pad) > @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile, > } > } > > + job->profiling.mask = pfile->ptdev->profile_mask; > + credits = calc_job_credits(job->profiling.mask); > + if (credits == 0) { > + ret = -EINVAL; > + goto err_put_job; > + } > + > ret = drm_sched_job_init(&job->base, > &job->group->queues[job->queue_idx]->entity, > - 1, job->group); > + credits, job->group); > if (ret) > goto err_put_job; >

8 months, 3 weeks

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by kernel test robot

Hi Adrián, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.11 next-20240913] [cannot apply to drm-misc/drm-misc-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: linus/master patch link: https://lore.kernel.org/r/20240913124857.389630-2-adrian.larumbe%40collabor… patch subject: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting config: i386-buildonly-randconfig-002-20240915 (https://download.01.org/0day-ci/archive/20240916/202409160050.m7VEd3pY-lkp@…) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240916/202409160050.m7VEd3pY-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409160050.m7VEd3pY-lkp@intel.com/ All errors (new ones prefixed by >>): >> ld.lld: error: call to __compiletime_assert_341 marked "dontcall-error": min(ringbuf_size - start, size) signedness error -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

8 months, 3 weeks

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by kernel test robot

Hi Adrián, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.11-rc7 next-20240913] [cannot apply to drm-misc/drm-misc-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: linus/master patch link: https://lore.kernel.org/r/20240913124857.389630-2-adrian.larumbe%40collabor… patch subject: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting config: i386-buildonly-randconfig-003-20240915 (https://download.01.org/0day-ci/archive/20240915/202409152243.r3t2jdOJ-lkp@…) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240915/202409152243.r3t2jdOJ-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409152243.r3t2jdOJ-lkp@intel.com/ All errors (new ones prefixed by >>): >> drivers/gpu/drm/panthor/panthor_sched.c:2885:12: error: call to '__compiletime_assert_371' declared with 'error' attribute: min(ringbuf_size - start, size) signedness error 2885 | written = min(ringbuf_size - start, size); | ^ include/linux/minmax.h:129:19: note: expanded from macro 'min' 129 | #define min(x, y) __careful_cmp(min, x, y) | ^ include/linux/minmax.h:105:2: note: expanded from macro '__careful_cmp' 105 | __careful_cmp_once(op, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_)) | ^ include/linux/minmax.h:100:2: note: expanded from macro '__careful_cmp_once' 100 | BUILD_BUG_ON_MSG(!__types_ok(x,y,ux,uy), \ | ^ note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) include/linux/compiler_types.h:498:2: note: expanded from macro '_compiletime_assert' 498 | __compiletime_assert(condition, msg, prefix, suffix) | ^ include/linux/compiler_types.h:491:4: note: expanded from macro '__compiletime_assert' 491 | prefix ## suffix(); \ | ^ <scratch space>:68:1: note: expanded from here 68 | __compiletime_assert_371 | ^ 1 error generated. vim +2885 drivers/gpu/drm/panthor/panthor_sched.c 2862 2863 #define JOB_INSTR(__prof, __instr) \ 2864 { \ 2865 .profile_mask = __prof, \ 2866 .instr = __instr, \ 2867 } 2868 2869 static void 2870 copy_instrs_to_ringbuf(struct panthor_queue *queue, 2871 struct panthor_job *job, 2872 struct panthor_job_ringbuf_instrs *instrs) 2873 { 2874 ssize_t ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); 2875 u32 start = job->ringbuf.start & (ringbuf_size - 1); 2876 ssize_t size, written; 2877 2878 /* 2879 * We need to write a whole slot, including any trailing zeroes 2880 * that may come at the end of it. Also, because instrs.buffer has 2881 * been zero-initialised, there's no need to pad it with 0's 2882 */ 2883 instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); 2884 size = instrs->count * sizeof(u64); > 2885 written = min(ringbuf_size - start, size); 2886 2887 memcpy(queue->ringbuf->kmap + start, instrs->buffer, written); 2888 2889 if (written < size) 2890 memcpy(queue->ringbuf->kmap, 2891 &instrs->buffer[(ringbuf_size - start)/sizeof(u64)], 2892 size - written); 2893 } 2894 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

8 months, 3 weeks

1
0
0 0

Re: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting

by kernel test robot

Hi Adrián, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on v6.11-rc7 next-20240913] [cannot apply to drm-misc/drm-misc-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: linus/master patch link: https://lore.kernel.org/r/20240913124857.389630-2-adrian.larumbe%40collabor… patch subject: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20240914/202409140506.OBoqSiVk-lkp@…) compiler: alpha-linux-gcc (GCC) 13.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240914/202409140506.OBoqSiVk-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409140506.OBoqSiVk-lkp@intel.com/ All warnings (new ones prefixed by >>): drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'runnable' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'idle' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'waiting' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'has_ref' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'in_progress' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'stopped_groups' description in 'panthor_scheduler' >> drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Function parameter or struct member 'profiling' not described in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'mem' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input_fw_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output_fw_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gpu_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'ref' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gt' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'sync64' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'bo' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'offset' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'kmap' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'lock' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'id' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'seqno' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'last_fence' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'in_flight_jobs' description in 'panthor_queue' >> drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'profiling_info' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slots' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slot_count' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'profiling_seqno' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'start' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'size' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'latest_flush' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'start' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'end' description in 'panthor_job' >> drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'mask' description in 'panthor_job' >> drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'slot' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:1734: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_fw_events' drivers/gpu/drm/panthor/panthor_sched.c:1734: warning: Function parameter or struct member 'events' not described in 'panthor_sched_report_fw_events' drivers/gpu/drm/panthor/panthor_sched.c:2626: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_mmu_fault' vim +494 drivers/gpu/drm/panthor/panthor_sched.c de85488138247d Boris Brezillon 2024-02-29 397 de85488138247d Boris Brezillon 2024-02-29 398 /** @ringbuf: Command stream ring-buffer. */ de85488138247d Boris Brezillon 2024-02-29 399 struct panthor_kernel_bo *ringbuf; de85488138247d Boris Brezillon 2024-02-29 400 de85488138247d Boris Brezillon 2024-02-29 401 /** @iface: Firmware interface. */ de85488138247d Boris Brezillon 2024-02-29 402 struct { de85488138247d Boris Brezillon 2024-02-29 403 /** @mem: FW memory allocated for this interface. */ de85488138247d Boris Brezillon 2024-02-29 404 struct panthor_kernel_bo *mem; de85488138247d Boris Brezillon 2024-02-29 405 de85488138247d Boris Brezillon 2024-02-29 406 /** @input: Input interface. */ de85488138247d Boris Brezillon 2024-02-29 407 struct panthor_fw_ringbuf_input_iface *input; de85488138247d Boris Brezillon 2024-02-29 408 de85488138247d Boris Brezillon 2024-02-29 409 /** @output: Output interface. */ de85488138247d Boris Brezillon 2024-02-29 410 const struct panthor_fw_ringbuf_output_iface *output; de85488138247d Boris Brezillon 2024-02-29 411 de85488138247d Boris Brezillon 2024-02-29 412 /** @input_fw_va: FW virtual address of the input interface buffer. */ de85488138247d Boris Brezillon 2024-02-29 413 u32 input_fw_va; de85488138247d Boris Brezillon 2024-02-29 414 de85488138247d Boris Brezillon 2024-02-29 415 /** @output_fw_va: FW virtual address of the output interface buffer. */ de85488138247d Boris Brezillon 2024-02-29 416 u32 output_fw_va; de85488138247d Boris Brezillon 2024-02-29 417 } iface; de85488138247d Boris Brezillon 2024-02-29 418 de85488138247d Boris Brezillon 2024-02-29 419 /** de85488138247d Boris Brezillon 2024-02-29 420 * @syncwait: Stores information about the synchronization object this de85488138247d Boris Brezillon 2024-02-29 421 * queue is waiting on. de85488138247d Boris Brezillon 2024-02-29 422 */ de85488138247d Boris Brezillon 2024-02-29 423 struct { de85488138247d Boris Brezillon 2024-02-29 424 /** @gpu_va: GPU address of the synchronization object. */ de85488138247d Boris Brezillon 2024-02-29 425 u64 gpu_va; de85488138247d Boris Brezillon 2024-02-29 426 de85488138247d Boris Brezillon 2024-02-29 427 /** @ref: Reference value to compare against. */ de85488138247d Boris Brezillon 2024-02-29 428 u64 ref; de85488138247d Boris Brezillon 2024-02-29 429 de85488138247d Boris Brezillon 2024-02-29 430 /** @gt: True if this is a greater-than test. */ de85488138247d Boris Brezillon 2024-02-29 431 bool gt; de85488138247d Boris Brezillon 2024-02-29 432 de85488138247d Boris Brezillon 2024-02-29 433 /** @sync64: True if this is a 64-bit sync object. */ de85488138247d Boris Brezillon 2024-02-29 434 bool sync64; de85488138247d Boris Brezillon 2024-02-29 435 de85488138247d Boris Brezillon 2024-02-29 436 /** @bo: Buffer object holding the synchronization object. */ de85488138247d Boris Brezillon 2024-02-29 437 struct drm_gem_object *obj; de85488138247d Boris Brezillon 2024-02-29 438 de85488138247d Boris Brezillon 2024-02-29 439 /** @offset: Offset of the synchronization object inside @bo. */ de85488138247d Boris Brezillon 2024-02-29 440 u64 offset; de85488138247d Boris Brezillon 2024-02-29 441 de85488138247d Boris Brezillon 2024-02-29 442 /** de85488138247d Boris Brezillon 2024-02-29 443 * @kmap: Kernel mapping of the buffer object holding the de85488138247d Boris Brezillon 2024-02-29 444 * synchronization object. de85488138247d Boris Brezillon 2024-02-29 445 */ de85488138247d Boris Brezillon 2024-02-29 446 void *kmap; de85488138247d Boris Brezillon 2024-02-29 447 } syncwait; de85488138247d Boris Brezillon 2024-02-29 448 de85488138247d Boris Brezillon 2024-02-29 449 /** @fence_ctx: Fence context fields. */ de85488138247d Boris Brezillon 2024-02-29 450 struct { de85488138247d Boris Brezillon 2024-02-29 451 /** @lock: Used to protect access to all fences allocated by this context. */ de85488138247d Boris Brezillon 2024-02-29 452 spinlock_t lock; de85488138247d Boris Brezillon 2024-02-29 453 de85488138247d Boris Brezillon 2024-02-29 454 /** de85488138247d Boris Brezillon 2024-02-29 455 * @id: Fence context ID. de85488138247d Boris Brezillon 2024-02-29 456 * de85488138247d Boris Brezillon 2024-02-29 457 * Allocated with dma_fence_context_alloc(). de85488138247d Boris Brezillon 2024-02-29 458 */ de85488138247d Boris Brezillon 2024-02-29 459 u64 id; de85488138247d Boris Brezillon 2024-02-29 460 de85488138247d Boris Brezillon 2024-02-29 461 /** @seqno: Sequence number of the last initialized fence. */ de85488138247d Boris Brezillon 2024-02-29 462 atomic64_t seqno; de85488138247d Boris Brezillon 2024-02-29 463 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 464 /** 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 465 * @last_fence: Fence of the last submitted job. 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 466 * 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 467 * We return this fence when we get an empty command stream. 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 468 * This way, we are guaranteed that all earlier jobs have completed 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 469 * when drm_sched_job::s_fence::finished without having to feed 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 470 * the CS ring buffer with a dummy job that only signals the fence. 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 471 */ 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 472 struct dma_fence *last_fence; 7b6f9ec6ad5112 Boris Brezillon 2024-07-03 473 de85488138247d Boris Brezillon 2024-02-29 474 /** de85488138247d Boris Brezillon 2024-02-29 475 * @in_flight_jobs: List containing all in-flight jobs. de85488138247d Boris Brezillon 2024-02-29 476 * de85488138247d Boris Brezillon 2024-02-29 477 * Used to keep track and signal panthor_job::done_fence when the de85488138247d Boris Brezillon 2024-02-29 478 * synchronization object attached to the queue is signaled. de85488138247d Boris Brezillon 2024-02-29 479 */ de85488138247d Boris Brezillon 2024-02-29 480 struct list_head in_flight_jobs; de85488138247d Boris Brezillon 2024-02-29 481 } fence_ctx; a706810cebb072 Adrián Larumbe 2024-09-13 482 a706810cebb072 Adrián Larumbe 2024-09-13 483 /** @profiling_info: Job profiling data slots and access information. */ a706810cebb072 Adrián Larumbe 2024-09-13 484 struct { a706810cebb072 Adrián Larumbe 2024-09-13 485 /** @slots: Kernel BO holding the slots. */ a706810cebb072 Adrián Larumbe 2024-09-13 486 struct panthor_kernel_bo *slots; a706810cebb072 Adrián Larumbe 2024-09-13 487 a706810cebb072 Adrián Larumbe 2024-09-13 488 /** @slot_count: Number of jobs ringbuffer can hold at once. */ a706810cebb072 Adrián Larumbe 2024-09-13 489 u32 slot_count; a706810cebb072 Adrián Larumbe 2024-09-13 490 a706810cebb072 Adrián Larumbe 2024-09-13 491 /** @profiling_seqno: Index of the next available profiling information slot. */ a706810cebb072 Adrián Larumbe 2024-09-13 492 u32 seqno; a706810cebb072 Adrián Larumbe 2024-09-13 493 } profiling; de85488138247d Boris Brezillon 2024-02-29 @494 }; de85488138247d Boris Brezillon 2024-02-29 495 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

8 months, 3 weeks

1
0
0 0

Re: (subset) [PATCH v7 00/16] Add audio support for the MediaTek Genio 350-evk board

by Mark Brown

On Mon, 22 Jul 2024 08:53:29 +0200, Alexandre Mergnat wrote: > This serie aim to add the following audio support for the Genio 350-evk: > - Playback > - 2ch Headset Jack (Earphone) > - 1ch Line-out Jack (Speaker) > - 8ch HDMI Tx > - Capture > - 1ch DMIC (On-board Digital Microphone) > - 1ch AMIC (On-board Analogic Microphone) > - 1ch Headset Jack (External Analogic Microphone) > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [01/16] ASoC: dt-bindings: mediatek,mt8365-afe: Add audio afe document commit: ceb3ca2876243e3ea02f78b3d488b1f2d734de49 [02/16] ASoC: dt-bindings: mediatek,mt8365-mt6357: Add audio sound card document commit: 76d80dcdd55f70b28930edb97b96ee375e1cce5a [03/16] dt-bindings: mfd: mediatek: Add codec property for MT6357 PMIC commit: 761cab667898d86c04867948f1b7aec1090be796 [04/16] ASoC: mediatek: mt8365: Add common header commit: 38c7c9ddc74033406461d64e541bbc8268e77f73 [05/16] ASoC: mediatek: mt8365: Add audio clock control support commit: ef307b40b7f0042d54f020bccb3e728ced292282 [06/16] ASoC: mediatek: mt8365: Add I2S DAI support commit: 402bbb13a195caa83b3279ebecdabfb11ddee084 [07/16] ASoC: mediatek: mt8365: Add ADDA DAI support commit: 7c58c88e524180e8439acdfc44872325e7f6d33d [08/16] ASoC: mediatek: mt8365: Add DMIC DAI support commit: 1c50ec75ce6c0c6b5736499393e522f73e19d0cf [09/16] ASoC: mediatek: mt8365: Add PCM DAI support commit: 5097c0c8634d703e3c59cfb89831b7db9dc46339 [10/16] ASoc: mediatek: mt8365: Add a specific soundcard for EVK commit: 1bf6dbd75f7603dd026660bebf324f812200dc1b [11/16] ASoC: mediatek: mt8365: Add the AFE driver support commit: e1991d102bc2abb32331c462f8f3e77059c69578 [12/16] ASoC: codecs: add MT6357 support (no commit info) [13/16] ASoC: mediatek: Add MT8365 support (no commit info) All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark

8 months, 4 weeks

3
8
0 0

[PATCH] dma-buf: heaps: Add __init to CMA and system heap module_init functions

by T.J. Mercier

Shrink the kernel .text a bit after successful initialization of the heaps. Signed-off-by: T.J. Mercier <tjmercier(a)google.com> --- drivers/dma-buf/heaps/cma_heap.c | 4 ++-- drivers/dma-buf/heaps/system_heap.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c index c384004b918e..39c1e533a2f2 100644 --- a/drivers/dma-buf/heaps/cma_heap.c +++ b/drivers/dma-buf/heaps/cma_heap.c @@ -366,7 +366,7 @@ static const struct dma_heap_ops cma_heap_ops = { .allocate = cma_heap_allocate, }; -static int __add_cma_heap(struct cma *cma, void *data) +static int __init __add_cma_heap(struct cma *cma, void *data) { struct cma_heap *cma_heap; struct dma_heap_export_info exp_info; @@ -391,7 +391,7 @@ static int __add_cma_heap(struct cma *cma, void *data) return 0; } -static int add_default_cma_heap(void) +static int __init add_default_cma_heap(void) { struct cma *default_cma = dev_get_cma_area(NULL); int ret = 0; diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c index d78cdb9d01e5..26d5dc89ea16 100644 --- a/drivers/dma-buf/heaps/system_heap.c +++ b/drivers/dma-buf/heaps/system_heap.c @@ -421,7 +421,7 @@ static const struct dma_heap_ops system_heap_ops = { .allocate = system_heap_allocate, }; -static int system_heap_create(void) +static int __init system_heap_create(void) { struct dma_heap_export_info exp_info; -- 2.46.0.598.g6f2099f65c-goog

9 months

3
2
0 0

Re: [PATCH v5 1/4] drm/panthor: introduce job cycle and timestamp accounting

by Liviu Dudau

On Tue, Sep 03, 2024 at 09:25:35PM +0100, Adrián Larumbe wrote: > Enable calculations of job submission times in clock cycles and wall > time. This is done by expanding the boilerplate command stream when running > a job to include instructions that compute said times right before an after s/an/and/ > a user CS. > > A separate kernel BO is created per queue to store those values. Jobs can > access their sampled data through a slots buffer-specific index different s/slots/slot's/ ? > from that of the queue's ringbuffer. The reason for this is saving memory > on the profiling information kernel BO, since the amount of simultaneous > profiled jobs we can write into the queue's ringbuffer might be much > smaller than for regular jobs, as the former take more CSF instructions. > > This commit is done in preparation for enabling DRM fdinfo support in the > Panthor driver, which depends on the numbers calculated herein. > > A profile mode mask has been added that will in a future commit allow UM to > toggle performance metric sampling behaviour, which is disabled by default > to save power. When a ringbuffer CS is constructed, timestamp and cycling > sampling instructions are added depending on the enabled flags in the > profiling mask. > > A helper was provided that calculates the number of instructions for a > given set of enablement mask, and these are passed as the number of credits > when initialising a DRM scheduler job. > > Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com> > --- > drivers/gpu/drm/panthor/panthor_device.h | 22 ++ > drivers/gpu/drm/panthor/panthor_sched.c | 327 ++++++++++++++++++++--- > 2 files changed, 305 insertions(+), 44 deletions(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index e388c0472ba7..a48e30d0af30 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -66,6 +66,25 @@ struct panthor_irq { > atomic_t suspended; > }; > > +/** > + * enum panthor_device_profiling_mode - Profiling state > + */ > +enum panthor_device_profiling_flags { > + /** @PANTHOR_DEVICE_PROFILING_DISABLED: Profiling is disabled. */ > + PANTHOR_DEVICE_PROFILING_DISABLED = 0, > + > + /** @PANTHOR_DEVICE_PROFILING_CYCLES: Sampling job cycles. */ > + PANTHOR_DEVICE_PROFILING_CYCLES = BIT(0), > + > + /** @PANTHOR_DEVICE_PROFILING_TIMESTAMP: Sampling job timestamp. */ > + PANTHOR_DEVICE_PROFILING_TIMESTAMP = BIT(1), > + > + /** @PANTHOR_DEVICE_PROFILING_ALL: Sampling everything. */ > + PANTHOR_DEVICE_PROFILING_ALL = > + PANTHOR_DEVICE_PROFILING_CYCLES | > + PANTHOR_DEVICE_PROFILING_TIMESTAMP, > +}; > + > /** > * struct panthor_device - Panthor device > */ > @@ -162,6 +181,9 @@ struct panthor_device { > */ > struct page *dummy_latest_flush; > } pm; > + > + /** @profile_mask: User-set profiling flags for job accounting. */ > + u32 profile_mask; > }; > > /** > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index c426a392b081..b087648bf59a 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -93,6 +93,9 @@ > #define MIN_CSGS 3 > #define MAX_CSG_PRIO 0xf > > +#define NUM_INSTRS_PER_CACHE_LINE (64 / sizeof(u64)) > +#define MAX_INSTRS_PER_JOB 32 > + > struct panthor_group; > > /** > @@ -476,6 +479,18 @@ struct panthor_queue { > */ > struct list_head in_flight_jobs; > } fence_ctx; > + > + /** @profiling_info: Job profiling data slots and access information. */ > + struct { > + /** @slots: Kernel BO holding the slots. */ > + struct panthor_kernel_bo *slots; > + > + /** @slot_count: Number of jobs ringbuffer can hold at once. */ > + u32 slot_count; > + > + /** @profiling_seqno: Index of the next available profiling information slot. */ > + u32 profiling_seqno; > + } profiling_info; > }; > > /** > @@ -661,6 +676,18 @@ struct panthor_group { > struct list_head wait_node; > }; > > +struct panthor_job_profiling_data { > + struct { > + u64 before; > + u64 after; > + } cycles; > + > + struct { > + u64 before; > + u64 after; > + } time; > +}; > + > /** > * group_queue_work() - Queue a group work > * @group: Group to queue the work for. > @@ -774,6 +801,12 @@ struct panthor_job { > > /** @done_fence: Fence signaled when the job is finished or cancelled. */ > struct dma_fence *done_fence; > + > + /** @profile_mask: Current device job profiling enablement bitmask. */ > + u32 profile_mask; > + > + /** @profile_slot: Job profiling information index in the profiling slots BO. */ > + u32 profiling_slot; > }; > > static void > @@ -838,6 +871,7 @@ static void group_free_queue(struct panthor_group *group, struct panthor_queue * > > panthor_kernel_bo_destroy(queue->ringbuf); > panthor_kernel_bo_destroy(queue->iface.mem); > + panthor_kernel_bo_destroy(queue->profiling_info.slots); > > /* Release the last_fence we were holding, if any. */ > dma_fence_put(queue->fence_ctx.last_fence); > @@ -1982,8 +2016,6 @@ tick_ctx_init(struct panthor_scheduler *sched, > } > } > > -#define NUM_INSTRS_PER_SLOT 16 > - > static void > group_term_post_processing(struct panthor_group *group) > { > @@ -2815,65 +2847,211 @@ static void group_sync_upd_work(struct work_struct *work) > group_put(group); > } > > -static struct dma_fence * > -queue_run_job(struct drm_sched_job *sched_job) > +struct panthor_job_ringbuf_instrs { > + u64 buffer[MAX_INSTRS_PER_JOB]; > + u32 count; > +}; > + > +struct panthor_job_instr { > + u32 profile_mask; > + u64 instr; > +}; > + > +#define JOB_INSTR(__prof, __instr) \ > + { \ > + .profile_mask = __prof, \ > + .instr = __instr, \ > + } > + > +static void > +copy_instrs_to_ringbuf(struct panthor_queue *queue, > + struct panthor_job *job, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + ssize_t ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > + u32 start = job->ringbuf.start & (ringbuf_size - 1); > + ssize_t size, written; > + > + /* > + * We need to write a whole slot, including any trailing zeroes > + * that may come at the end of it. Also, because instrs.buffer had s/had/has/ > + * been zero-initialised, there's no need to pad it with 0's > + */ > + instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + size = instrs->count * sizeof(u64); > + written = min(ringbuf_size - start, size); > + > + memcpy(queue->ringbuf->kmap + start, instrs->buffer, written); > + > + if (written < size) > + memcpy(queue->ringbuf->kmap, > + &instrs->buffer[(ringbuf_size - start)/sizeof(u64)], > + size - written); > +} > + > +struct panthor_job_cs_params { > + u32 profile_mask; > + u64 addr_reg; u64 val_reg; > + u64 cycle_reg; u64 time_reg; > + u64 sync_addr; u64 times_addr; > + u64 cs_start; u64 cs_size; > + u32 last_flush; u32 waitall_mask; > +}; > + > +static void > +get_job_cs_params(struct panthor_job *job, struct panthor_job_cs_params *params) > { > - struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > struct panthor_group *group = job->group; > struct panthor_queue *queue = group->queues[job->queue_idx]; > struct panthor_device *ptdev = group->ptdev; > struct panthor_scheduler *sched = ptdev->scheduler; > - u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); > - u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1); > - u64 addr_reg = ptdev->csif_info.cs_reg_count - > - ptdev->csif_info.unpreserved_cs_reg_count; > - u64 val_reg = addr_reg + 2; > - u64 sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > - job->queue_idx * sizeof(struct panthor_syncobj_64b); > - u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > - struct dma_fence *done_fence; > - int ret; > > - u64 call_instrs[NUM_INSTRS_PER_SLOT] = { > + params->addr_reg = ptdev->csif_info.cs_reg_count - > + ptdev->csif_info.unpreserved_cs_reg_count; > + params->val_reg = params->addr_reg + 2; > + params->cycle_reg = params->addr_reg; > + params->time_reg = params->val_reg; > + > + params->sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + > + job->queue_idx * sizeof(struct panthor_syncobj_64b); > + params->times_addr = panthor_kernel_bo_gpuva(queue->profiling_info.slots) + > + (job->profiling_slot * sizeof(struct panthor_job_profiling_data)); > + params->waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); > + > + params->cs_start = job->call_info.start; > + params->cs_size = job->call_info.size; > + params->last_flush = job->call_info.latest_flush; > + > + params->profile_mask = job->profile_mask; > +} > + > +static void > +prepare_job_instrs(const struct panthor_job_cs_params *params, > + struct panthor_job_ringbuf_instrs *instrs) > +{ > + const struct panthor_job_instr instr_seq[] = { > /* MOV32 rX+2, cs.latest_flush */ > - (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (2ull << 56) | (params->val_reg << 48) | params->last_flush), > > /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ > - (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233, > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (36ull << 56) | (0ull << 48) | (params->val_reg << 40) | (0 << 16) | 0x233), > + > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, cycles.before))), > + /* STORE_STATE cycles */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (1ull << 56) | (params->time_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, time.before))), > + /* STORE_STATE timer */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > > /* MOV48 rX:rX+1, cs.start */ > - (1ull << 56) | (addr_reg << 48) | job->call_info.start, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->addr_reg << 48) | params->cs_start), > /* MOV32 rX+2, cs.size */ > - (2ull << 56) | (val_reg << 48) | job->call_info.size, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (2ull << 56) | (params->val_reg << 48) | params->cs_size), > /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ > - (3ull << 56) | (1 << 16), > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (3ull << 56) | (1 << 16)), > /* CALL rX:rX+1, rX+2 */ > - (32ull << 56) | (addr_reg << 40) | (val_reg << 32), > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (32ull << 56) | (params->addr_reg << 40) | (params->val_reg << 32)), > + > + /* MOV48 rX:rX+1, cycles_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (1ull << 56) | (params->cycle_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, cycles.after))), > + /* STORE_STATE cycles */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, > + (40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)), > + > + /* MOV48 rX:rX+1, time_offset */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (1ull << 56) | (params->time_reg << 48) | > + (params->times_addr + > + offsetof(struct panthor_job_profiling_data, time.after))), > + /* STORE_STATE timer */ > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, > + (40ull << 56) | (params->time_reg << 40) | (0ll << 32)), > > /* MOV48 rX:rX+1, sync_addr */ > - (1ull << 56) | (addr_reg << 48) | sync_addr, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->addr_reg << 48) | params->sync_addr), > /* MOV48 rX+2, #1 */ > - (1ull << 56) | (val_reg << 48) | 1, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (1ull << 56) | (params->val_reg << 48) | 1), > /* WAIT(all) */ > - (3ull << 56) | (waitall_mask << 16), > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (3ull << 56) | (params->waitall_mask << 16)), > /* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/ > - (51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 32) | (0 << 16) | 1, > - > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, > + (51ull << 56) | (0ull << 48) | (params->addr_reg << 40) | > + (params->val_reg << 32) | (0 << 16) | 1), > /* ERROR_BARRIER, so we can recover from faults at job > * boundaries. > */ > - (47ull << 56), > + JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (47ull << 56)), > + }; > + u32 pad; > + > + /* NEED to be cacheline aligned to please the prefetcher. */ > + static_assert(sizeof(instrs->buffer) % 64 == 0, > + "panthor_job_ringbuf_instrs::buffer is not aligned on a cacheline"); > + > + /* Make sure we have enough storage to store the whole sequence. */ > + static_assert(ALIGN(ARRAY_SIZE(instr_seq), NUM_INSTRS_PER_CACHE_LINE) <= > + ARRAY_SIZE(instrs->buffer), > + "instr_seq vs panthor_job_ringbuf_instrs::buffer size mismatch"); > + > + for (u32 i = 0; i < ARRAY_SIZE(instr_seq); i++) { > + /* If the profile mask of this instruction is not enabled, skip it. */ > + if (instr_seq[i].profile_mask && > + !(instr_seq[i].profile_mask & params->profile_mask)) > + continue; > + > + instrs->buffer[instrs->count++] = instr_seq[i].instr; > + } > + > + pad = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE); > + memset(&instrs->buffer[instrs->count], 0, > + (pad - instrs->count) * sizeof(instrs->buffer[0])); > + instrs->count = pad; > +} > + > +static u32 calc_job_credits(u32 profile_mask) > +{ > + struct panthor_job_ringbuf_instrs instrs; > + struct panthor_job_cs_params params = { > + .profile_mask = profile_mask, > }; > > - /* Need to be cacheline aligned to please the prefetcher. */ > - static_assert(sizeof(call_instrs) % 64 == 0, > - "call_instrs is not aligned on a cacheline"); > + prepare_job_instrs(&params, &instrs); > + return instrs.count; > +} > + > +static struct dma_fence * > +queue_run_job(struct drm_sched_job *sched_job) > +{ > + struct panthor_job *job = container_of(sched_job, struct panthor_job, base); > + struct panthor_group *group = job->group; > + struct panthor_queue *queue = group->queues[job->queue_idx]; > + struct panthor_device *ptdev = group->ptdev; > + struct panthor_scheduler *sched = ptdev->scheduler; > + struct panthor_job_ringbuf_instrs instrs; > + struct panthor_job_cs_params cs_params; > + struct dma_fence *done_fence; > + int ret; > > /* Stream size is zero, nothing to do except making sure all previously > * submitted jobs are done before we signal the > @@ -2900,17 +3078,23 @@ queue_run_job(struct drm_sched_job *sched_job) > queue->fence_ctx.id, > atomic64_inc_return(&queue->fence_ctx.seqno)); > > - memcpy(queue->ringbuf->kmap + ringbuf_insert, > - call_instrs, sizeof(call_instrs)); > + job->profiling_slot = queue->profiling_info.profiling_seqno++; > + if (queue->profiling_info.profiling_seqno == queue->profiling_info.slot_count) > + queue->profiling_info.profiling_seqno = 0; > + > + job->ringbuf.start = queue->iface.input->insert; > + > + get_job_cs_params(job, &cs_params); > + prepare_job_instrs(&cs_params, &instrs); > + copy_instrs_to_ringbuf(queue, job, &instrs); > + > + job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64)); > > panthor_job_get(&job->base); > spin_lock(&queue->fence_ctx.lock); > list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs); > spin_unlock(&queue->fence_ctx.lock); > > - job->ringbuf.start = queue->iface.input->insert; > - job->ringbuf.end = job->ringbuf.start + sizeof(call_instrs); > - > /* Make sure the ring buffer is updated before the INSERT > * register. > */ > @@ -3003,6 +3187,24 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { > .free_job = queue_free_job, > }; > > +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev, > + u32 cs_ringbuf_size) > +{ > + u32 min_profiled_job_instrs = U32_MAX; > + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL); > + > + for (u32 i = 0; i < last_flag; i++) { > + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) > + min_profiled_job_instrs = > + min(min_profiled_job_instrs, calc_job_credits(BIT(i))); > + } > + > + drm_WARN_ON(&ptdev->base, > + !IS_ALIGNED(min_profiled_job_instrs, NUM_INSTRS_PER_CACHE_LINE)); > + > + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64)); > +} > + > static struct panthor_queue * > group_create_queue(struct panthor_group *group, > const struct drm_panthor_queue_create *args) > @@ -3056,9 +3258,38 @@ group_create_queue(struct panthor_group *group, > goto err_free_queue; > } > > + queue->profiling_info.slot_count = > + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size); > + > + queue->profiling_info.slots = > + panthor_kernel_bo_create(group->ptdev, group->vm, > + queue->profiling_info.slot_count * > + sizeof(struct panthor_job_profiling_data), > + DRM_PANTHOR_BO_NO_MMAP, > + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | > + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, > + PANTHOR_VM_KERNEL_AUTO_VA); > + > + if (IS_ERR(queue->profiling_info.slots)) { > + ret = PTR_ERR(queue->profiling_info.slots); > + goto err_free_queue; > + } > + > + ret = panthor_kernel_bo_vmap(queue->profiling_info.slots); > + if (ret) > + goto err_free_queue; > + > + memset(queue->profiling_info.slots->kmap, 0, > + queue->profiling_info.slot_count * sizeof(struct panthor_job_profiling_data)); > + > + /* > + * Credit limit argument tells us the total number of instructions > + * across all CS slots in the ringbuffer, with some jobs requiring > + * twice as many as others, depending on their profiling status. > + */ > ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, > group->ptdev->scheduler->wq, 1, > - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), > + args->ringbuf_size / sizeof(u64), > 0, msecs_to_jiffies(JOB_TIMEOUT_MS), > group->ptdev->reset.wq, > NULL, "panthor-queue", group->ptdev->base.dev); > @@ -3354,6 +3585,7 @@ panthor_job_create(struct panthor_file *pfile, > { > struct panthor_group_pool *gpool = pfile->groups; > struct panthor_job *job; > + u32 credits; > int ret; > > if (qsubmit->pad) > @@ -3407,9 +3639,16 @@ panthor_job_create(struct panthor_file *pfile, > } > } > > + job->profile_mask = pfile->ptdev->profile_mask; > + credits = calc_job_credits(job->profile_mask); > + if (credits == 0) { > + ret = -EINVAL; > + goto err_put_job; > + } > + > ret = drm_sched_job_init(&job->base, > &job->group->queues[job->queue_idx]->entity, > - 1, job->group); > + credits, job->group); > if (ret) > goto err_put_job; > > -- > 2.46.0 > Reviewed-by: Liviu Dudau <liviu.dudau(a)arm.com> Best regards, Liviu -- ==================== | I would like to | | fix the world, | | but they're not | | giving me the | \ source code! / --------------- ¯\_(ツ)_/¯

9 months

1
0
0 0

Re: [PATCH v5 2/4] drm/panthor: add DRM fdinfo support

by kernel test robot

Hi Adrián, kernel test robot noticed the following build warnings: [auto build test WARNING on drm-misc/drm-misc-next] [also build test WARNING on linus/master v6.11-rc6 next-20240904] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next patch link: https://lore.kernel.org/r/20240903202541.430225-3-adrian.larumbe%40collabor… patch subject: [PATCH v5 2/4] drm/panthor: add DRM fdinfo support config: x86_64-buildonly-randconfig-002-20240904 (https://download.01.org/0day-ci/archive/20240905/202409050134.uxrIkhqc-lkp@…) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240905/202409050134.uxrIkhqc-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409050134.uxrIkhqc-lkp@intel.com/ All warnings (new ones prefixed by >>): drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'runnable' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'idle' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'waiting' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'has_ref' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'in_progress' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'stopped_groups' description in 'panthor_scheduler' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'mem' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input_fw_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output_fw_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gpu_va' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'ref' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gt' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'sync64' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'bo' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'offset' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'kmap' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'lock' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'id' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'seqno' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'last_fence' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'in_flight_jobs' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slots' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slot_count' description in 'panthor_queue' drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'profiling_seqno' description in 'panthor_queue' >> drivers/gpu/drm/panthor/panthor_sched.c:689: warning: Excess struct member 'data' description in 'panthor_group' >> drivers/gpu/drm/panthor/panthor_sched.c:689: warning: Excess struct member 'lock' description in 'panthor_group' drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Function parameter or struct member 'profiling_slot' not described in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'start' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'size' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'latest_flush' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'start' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'end' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'profile_slot' description in 'panthor_job' drivers/gpu/drm/panthor/panthor_sched.c:1745: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_fw_events' drivers/gpu/drm/panthor/panthor_sched.c:1745: warning: Function parameter or struct member 'events' not described in 'panthor_sched_report_fw_events' drivers/gpu/drm/panthor/panthor_sched.c:2637: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_mmu_fault' vim +689 drivers/gpu/drm/panthor/panthor_sched.c de85488138247d0 Boris Brezillon 2024-02-29 531 de85488138247d0 Boris Brezillon 2024-02-29 532 /** de85488138247d0 Boris Brezillon 2024-02-29 533 * struct panthor_group - Scheduling group object de85488138247d0 Boris Brezillon 2024-02-29 534 */ de85488138247d0 Boris Brezillon 2024-02-29 535 struct panthor_group { de85488138247d0 Boris Brezillon 2024-02-29 536 /** @refcount: Reference count */ de85488138247d0 Boris Brezillon 2024-02-29 537 struct kref refcount; de85488138247d0 Boris Brezillon 2024-02-29 538 de85488138247d0 Boris Brezillon 2024-02-29 539 /** @ptdev: Device. */ de85488138247d0 Boris Brezillon 2024-02-29 540 struct panthor_device *ptdev; de85488138247d0 Boris Brezillon 2024-02-29 541 de85488138247d0 Boris Brezillon 2024-02-29 542 /** @vm: VM bound to the group. */ de85488138247d0 Boris Brezillon 2024-02-29 543 struct panthor_vm *vm; de85488138247d0 Boris Brezillon 2024-02-29 544 de85488138247d0 Boris Brezillon 2024-02-29 545 /** @compute_core_mask: Mask of shader cores that can be used for compute jobs. */ de85488138247d0 Boris Brezillon 2024-02-29 546 u64 compute_core_mask; de85488138247d0 Boris Brezillon 2024-02-29 547 de85488138247d0 Boris Brezillon 2024-02-29 548 /** @fragment_core_mask: Mask of shader cores that can be used for fragment jobs. */ de85488138247d0 Boris Brezillon 2024-02-29 549 u64 fragment_core_mask; de85488138247d0 Boris Brezillon 2024-02-29 550 de85488138247d0 Boris Brezillon 2024-02-29 551 /** @tiler_core_mask: Mask of tiler cores that can be used for tiler jobs. */ de85488138247d0 Boris Brezillon 2024-02-29 552 u64 tiler_core_mask; de85488138247d0 Boris Brezillon 2024-02-29 553 de85488138247d0 Boris Brezillon 2024-02-29 554 /** @max_compute_cores: Maximum number of shader cores used for compute jobs. */ de85488138247d0 Boris Brezillon 2024-02-29 555 u8 max_compute_cores; de85488138247d0 Boris Brezillon 2024-02-29 556 be7ffc821f5fc2e Liviu Dudau 2024-04-02 557 /** @max_fragment_cores: Maximum number of shader cores used for fragment jobs. */ de85488138247d0 Boris Brezillon 2024-02-29 558 u8 max_fragment_cores; de85488138247d0 Boris Brezillon 2024-02-29 559 de85488138247d0 Boris Brezillon 2024-02-29 560 /** @max_tiler_cores: Maximum number of tiler cores used for tiler jobs. */ de85488138247d0 Boris Brezillon 2024-02-29 561 u8 max_tiler_cores; de85488138247d0 Boris Brezillon 2024-02-29 562 de85488138247d0 Boris Brezillon 2024-02-29 563 /** @priority: Group priority (check panthor_csg_priority). */ de85488138247d0 Boris Brezillon 2024-02-29 564 u8 priority; de85488138247d0 Boris Brezillon 2024-02-29 565 de85488138247d0 Boris Brezillon 2024-02-29 566 /** @blocked_queues: Bitmask reflecting the blocked queues. */ de85488138247d0 Boris Brezillon 2024-02-29 567 u32 blocked_queues; de85488138247d0 Boris Brezillon 2024-02-29 568 de85488138247d0 Boris Brezillon 2024-02-29 569 /** @idle_queues: Bitmask reflecting the idle queues. */ de85488138247d0 Boris Brezillon 2024-02-29 570 u32 idle_queues; de85488138247d0 Boris Brezillon 2024-02-29 571 de85488138247d0 Boris Brezillon 2024-02-29 572 /** @fatal_lock: Lock used to protect access to fatal fields. */ de85488138247d0 Boris Brezillon 2024-02-29 573 spinlock_t fatal_lock; de85488138247d0 Boris Brezillon 2024-02-29 574 de85488138247d0 Boris Brezillon 2024-02-29 575 /** @fatal_queues: Bitmask reflecting the queues that hit a fatal exception. */ de85488138247d0 Boris Brezillon 2024-02-29 576 u32 fatal_queues; de85488138247d0 Boris Brezillon 2024-02-29 577 de85488138247d0 Boris Brezillon 2024-02-29 578 /** @tiler_oom: Mask of queues that have a tiler OOM event to process. */ de85488138247d0 Boris Brezillon 2024-02-29 579 atomic_t tiler_oom; de85488138247d0 Boris Brezillon 2024-02-29 580 de85488138247d0 Boris Brezillon 2024-02-29 581 /** @queue_count: Number of queues in this group. */ de85488138247d0 Boris Brezillon 2024-02-29 582 u32 queue_count; de85488138247d0 Boris Brezillon 2024-02-29 583 de85488138247d0 Boris Brezillon 2024-02-29 584 /** @queues: Queues owned by this group. */ de85488138247d0 Boris Brezillon 2024-02-29 585 struct panthor_queue *queues[MAX_CS_PER_CSG]; de85488138247d0 Boris Brezillon 2024-02-29 586 de85488138247d0 Boris Brezillon 2024-02-29 587 /** de85488138247d0 Boris Brezillon 2024-02-29 588 * @csg_id: ID of the FW group slot. de85488138247d0 Boris Brezillon 2024-02-29 589 * de85488138247d0 Boris Brezillon 2024-02-29 590 * -1 when the group is not scheduled/active. de85488138247d0 Boris Brezillon 2024-02-29 591 */ de85488138247d0 Boris Brezillon 2024-02-29 592 int csg_id; de85488138247d0 Boris Brezillon 2024-02-29 593 de85488138247d0 Boris Brezillon 2024-02-29 594 /** de85488138247d0 Boris Brezillon 2024-02-29 595 * @destroyed: True when the group has been destroyed. de85488138247d0 Boris Brezillon 2024-02-29 596 * de85488138247d0 Boris Brezillon 2024-02-29 597 * If a group is destroyed it becomes useless: no further jobs can be submitted de85488138247d0 Boris Brezillon 2024-02-29 598 * to its queues. We simply wait for all references to be dropped so we can de85488138247d0 Boris Brezillon 2024-02-29 599 * release the group object. de85488138247d0 Boris Brezillon 2024-02-29 600 */ de85488138247d0 Boris Brezillon 2024-02-29 601 bool destroyed; de85488138247d0 Boris Brezillon 2024-02-29 602 de85488138247d0 Boris Brezillon 2024-02-29 603 /** de85488138247d0 Boris Brezillon 2024-02-29 604 * @timedout: True when a timeout occurred on any of the queues owned by de85488138247d0 Boris Brezillon 2024-02-29 605 * this group. de85488138247d0 Boris Brezillon 2024-02-29 606 * de85488138247d0 Boris Brezillon 2024-02-29 607 * Timeouts can be reported by drm_sched or by the FW. In any case, any de85488138247d0 Boris Brezillon 2024-02-29 608 * timeout situation is unrecoverable, and the group becomes useless. de85488138247d0 Boris Brezillon 2024-02-29 609 * We simply wait for all references to be dropped so we can release the de85488138247d0 Boris Brezillon 2024-02-29 610 * group object. de85488138247d0 Boris Brezillon 2024-02-29 611 */ de85488138247d0 Boris Brezillon 2024-02-29 612 bool timedout; de85488138247d0 Boris Brezillon 2024-02-29 613 de85488138247d0 Boris Brezillon 2024-02-29 614 /** de85488138247d0 Boris Brezillon 2024-02-29 615 * @syncobjs: Pool of per-queue synchronization objects. de85488138247d0 Boris Brezillon 2024-02-29 616 * de85488138247d0 Boris Brezillon 2024-02-29 617 * One sync object per queue. The position of the sync object is de85488138247d0 Boris Brezillon 2024-02-29 618 * determined by the queue index. de85488138247d0 Boris Brezillon 2024-02-29 619 */ de85488138247d0 Boris Brezillon 2024-02-29 620 struct panthor_kernel_bo *syncobjs; de85488138247d0 Boris Brezillon 2024-02-29 621 d7baaf2591f58fc Adrián Larumbe 2024-09-03 622 /** @fdinfo: Per-file total cycle and timestamp values reference. */ d7baaf2591f58fc Adrián Larumbe 2024-09-03 623 struct { d7baaf2591f58fc Adrián Larumbe 2024-09-03 624 /** @data: Pointer to actual per-file sample data. */ d7baaf2591f58fc Adrián Larumbe 2024-09-03 625 struct panthor_gpu_usage *data; d7baaf2591f58fc Adrián Larumbe 2024-09-03 626 d7baaf2591f58fc Adrián Larumbe 2024-09-03 627 /** d7baaf2591f58fc Adrián Larumbe 2024-09-03 628 * @lock: Mutex to govern concurrent access from drm file's fdinfo callback d7baaf2591f58fc Adrián Larumbe 2024-09-03 629 * and job post-completion processing function d7baaf2591f58fc Adrián Larumbe 2024-09-03 630 */ d7baaf2591f58fc Adrián Larumbe 2024-09-03 631 struct mutex lock; d7baaf2591f58fc Adrián Larumbe 2024-09-03 632 } fdinfo; d7baaf2591f58fc Adrián Larumbe 2024-09-03 633 de85488138247d0 Boris Brezillon 2024-02-29 634 /** @state: Group state. */ de85488138247d0 Boris Brezillon 2024-02-29 635 enum panthor_group_state state; de85488138247d0 Boris Brezillon 2024-02-29 636 de85488138247d0 Boris Brezillon 2024-02-29 637 /** de85488138247d0 Boris Brezillon 2024-02-29 638 * @suspend_buf: Suspend buffer. de85488138247d0 Boris Brezillon 2024-02-29 639 * de85488138247d0 Boris Brezillon 2024-02-29 640 * Stores the state of the group and its queues when a group is suspended. de85488138247d0 Boris Brezillon 2024-02-29 641 * Used at resume time to restore the group in its previous state. de85488138247d0 Boris Brezillon 2024-02-29 642 * de85488138247d0 Boris Brezillon 2024-02-29 643 * The size of the suspend buffer is exposed through the FW interface. de85488138247d0 Boris Brezillon 2024-02-29 644 */ de85488138247d0 Boris Brezillon 2024-02-29 645 struct panthor_kernel_bo *suspend_buf; de85488138247d0 Boris Brezillon 2024-02-29 646 de85488138247d0 Boris Brezillon 2024-02-29 647 /** de85488138247d0 Boris Brezillon 2024-02-29 648 * @protm_suspend_buf: Protection mode suspend buffer. de85488138247d0 Boris Brezillon 2024-02-29 649 * de85488138247d0 Boris Brezillon 2024-02-29 650 * Stores the state of the group and its queues when a group that's in de85488138247d0 Boris Brezillon 2024-02-29 651 * protection mode is suspended. de85488138247d0 Boris Brezillon 2024-02-29 652 * de85488138247d0 Boris Brezillon 2024-02-29 653 * Used at resume time to restore the group in its previous state. de85488138247d0 Boris Brezillon 2024-02-29 654 * de85488138247d0 Boris Brezillon 2024-02-29 655 * The size of the protection mode suspend buffer is exposed through the de85488138247d0 Boris Brezillon 2024-02-29 656 * FW interface. de85488138247d0 Boris Brezillon 2024-02-29 657 */ de85488138247d0 Boris Brezillon 2024-02-29 658 struct panthor_kernel_bo *protm_suspend_buf; de85488138247d0 Boris Brezillon 2024-02-29 659 de85488138247d0 Boris Brezillon 2024-02-29 660 /** @sync_upd_work: Work used to check/signal job fences. */ de85488138247d0 Boris Brezillon 2024-02-29 661 struct work_struct sync_upd_work; de85488138247d0 Boris Brezillon 2024-02-29 662 de85488138247d0 Boris Brezillon 2024-02-29 663 /** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */ de85488138247d0 Boris Brezillon 2024-02-29 664 struct work_struct tiler_oom_work; de85488138247d0 Boris Brezillon 2024-02-29 665 de85488138247d0 Boris Brezillon 2024-02-29 666 /** @term_work: Work used to finish the group termination procedure. */ de85488138247d0 Boris Brezillon 2024-02-29 667 struct work_struct term_work; de85488138247d0 Boris Brezillon 2024-02-29 668 de85488138247d0 Boris Brezillon 2024-02-29 669 /** de85488138247d0 Boris Brezillon 2024-02-29 670 * @release_work: Work used to release group resources. de85488138247d0 Boris Brezillon 2024-02-29 671 * de85488138247d0 Boris Brezillon 2024-02-29 672 * We need to postpone the group release to avoid a deadlock when de85488138247d0 Boris Brezillon 2024-02-29 673 * the last ref is released in the tick work. de85488138247d0 Boris Brezillon 2024-02-29 674 */ de85488138247d0 Boris Brezillon 2024-02-29 675 struct work_struct release_work; de85488138247d0 Boris Brezillon 2024-02-29 676 de85488138247d0 Boris Brezillon 2024-02-29 677 /** de85488138247d0 Boris Brezillon 2024-02-29 678 * @run_node: Node used to insert the group in the de85488138247d0 Boris Brezillon 2024-02-29 679 * panthor_group::groups::{runnable,idle} and de85488138247d0 Boris Brezillon 2024-02-29 680 * panthor_group::reset.stopped_groups lists. de85488138247d0 Boris Brezillon 2024-02-29 681 */ de85488138247d0 Boris Brezillon 2024-02-29 682 struct list_head run_node; de85488138247d0 Boris Brezillon 2024-02-29 683 de85488138247d0 Boris Brezillon 2024-02-29 684 /** de85488138247d0 Boris Brezillon 2024-02-29 685 * @wait_node: Node used to insert the group in the de85488138247d0 Boris Brezillon 2024-02-29 686 * panthor_group::groups::waiting list. de85488138247d0 Boris Brezillon 2024-02-29 687 */ de85488138247d0 Boris Brezillon 2024-02-29 688 struct list_head wait_node; de85488138247d0 Boris Brezillon 2024-02-29 @689 }; de85488138247d0 Boris Brezillon 2024-02-29 690 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

9 months

1
0
0 0

Re: [PATCH v5 1/4] drm/panthor: introduce job cycle and timestamp accounting

by kernel test robot

Hi Adrián, kernel test robot noticed the following build errors: [auto build test ERROR on drm-misc/drm-misc-next] [also build test ERROR on linus/master v6.11-rc6 next-20240904] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i… base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next patch link: https://lore.kernel.org/r/20240903202541.430225-2-adrian.larumbe%40collabor… patch subject: [PATCH v5 1/4] drm/panthor: introduce job cycle and timestamp accounting config: arc-allmodconfig (https://download.01.org/0day-ci/archive/20240905/202409050054.oRwtzLQ4-lkp@…) compiler: arceb-elf-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240905/202409050054.oRwtzLQ4-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409050054.oRwtzLQ4-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from <command-line>: In function 'copy_instrs_to_ringbuf', inlined from 'queue_run_job' at drivers/gpu/drm/panthor/panthor_sched.c:3089:2: >> include/linux/compiler_types.h:510:45: error: call to '__compiletime_assert_435' declared with attribute error: min(ringbuf_size - start, size) signedness error 510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ include/linux/compiler_types.h:491:25: note: in definition of macro '__compiletime_assert' 491 | prefix ## suffix(); \ | ^~~~~~ include/linux/compiler_types.h:510:9: note: in expansion of macro '_compiletime_assert' 510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) | ^~~~~~~~~~~~~~~~~~ include/linux/minmax.h:100:9: note: in expansion of macro 'BUILD_BUG_ON_MSG' 100 | BUILD_BUG_ON_MSG(!__types_ok(x,y,ux,uy), \ | ^~~~~~~~~~~~~~~~ include/linux/minmax.h:105:9: note: in expansion of macro '__careful_cmp_once' 105 | __careful_cmp_once(op, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_)) | ^~~~~~~~~~~~~~~~~~ include/linux/minmax.h:129:25: note: in expansion of macro '__careful_cmp' 129 | #define min(x, y) __careful_cmp(min, x, y) | ^~~~~~~~~~~~~ drivers/gpu/drm/panthor/panthor_sched.c:2882:19: note: in expansion of macro 'min' 2882 | written = min(ringbuf_size - start, size); | ^~~ vim +/__compiletime_assert_435 +510 include/linux/compiler_types.h eb5c2d4b45e3d2 Will Deacon 2020-07-21 496 eb5c2d4b45e3d2 Will Deacon 2020-07-21 497 #define _compiletime_assert(condition, msg, prefix, suffix) \ eb5c2d4b45e3d2 Will Deacon 2020-07-21 498 __compiletime_assert(condition, msg, prefix, suffix) eb5c2d4b45e3d2 Will Deacon 2020-07-21 499 eb5c2d4b45e3d2 Will Deacon 2020-07-21 500 /** eb5c2d4b45e3d2 Will Deacon 2020-07-21 501 * compiletime_assert - break build and emit msg if condition is false eb5c2d4b45e3d2 Will Deacon 2020-07-21 502 * @condition: a compile-time constant condition to check eb5c2d4b45e3d2 Will Deacon 2020-07-21 503 * @msg: a message to emit if condition is false eb5c2d4b45e3d2 Will Deacon 2020-07-21 504 * eb5c2d4b45e3d2 Will Deacon 2020-07-21 505 * In tradition of POSIX assert, this macro will break the build if the eb5c2d4b45e3d2 Will Deacon 2020-07-21 506 * supplied condition is *false*, emitting the supplied error message if the eb5c2d4b45e3d2 Will Deacon 2020-07-21 507 * compiler has support to do so. eb5c2d4b45e3d2 Will Deacon 2020-07-21 508 */ eb5c2d4b45e3d2 Will Deacon 2020-07-21 509 #define compiletime_assert(condition, msg) \ eb5c2d4b45e3d2 Will Deacon 2020-07-21 @510 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) eb5c2d4b45e3d2 Will Deacon 2020-07-21 511 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

9 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig