On 02/10/2024 09:38, Boris Brezillon wrote:
> On Tue, 24 Sep 2024 00:06:21 +0100
> Adrián Larumbe <adrian.larumbe(a)collabora.com> wrote:
>
>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
>> + u32 cs_ringbuf_size)
>> +{
>> + u32 min_profiled_job_instrs = U32_MAX;
>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
>> +
>> + /*
>> + * We want to calculate the minimum size of a profiled job's CS,
>> + * because since they need additional instructions for the sampling
>> + * of performance metrics, they might take up further slots in
>> + * the queue's ringbuffer. This means we might not need as many job
>> + * slots for keeping track of their profiling information. What we
>> + * need is the maximum number of slots we should allocate to this end,
>> + * which matches the maximum number of profiled jobs we can place
>> + * simultaneously in the queue's ring buffer.
>> + * That has to be calculated separately for every single job profiling
>> + * flag, but not in the case job profiling is disabled, since unprofiled
>> + * jobs don't need to keep track of this at all.
>> + */
>> + for (u32 i = 0; i < last_flag; i++) {
>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)
>
> I'll get rid of this check when applying, as suggested by Steve. Steve,
> with this modification do you want me to add your R-b?
Yes, please do.
Thanks,
Steve
> BTW, I've also fixed a bunch of checkpatch errors/warnings, so you
> might want to run checkpatch --strict next time.
>
>> + min_profiled_job_instrs =
>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
>> + }
>> +
>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
>> +}
Hi,
Am 30.09.24 um 21:38 schrieb Zichen Xie:
> Dear Linux Developers for DMA BUFFER SHARING FRAMEWORK,
>
> We are curious about the function 'dma_resv_get_fences' here:
> https://elixir.bootlin.com/linux/v6.11/source/drivers/dma-buf/dma-resv.c#L5…,
> and the logic below:
> ```
> dma_resv_for_each_fence_unlocked(&cursor, fence) {
>
> if (dma_resv_iter_is_restarted(&cursor)) {
> struct dma_fence **new_fences;
> unsigned int count;
>
> while (*num_fences)
> dma_fence_put((*fences)[--(*num_fences)]);
>
> count = cursor.num_fences + 1;
>
> /* Eventually re-allocate the array */
> new_fences = krealloc_array(*fences, count,
> sizeof(void *),
> GFP_KERNEL);
> if (count && !new_fences) {
> kfree(*fences);
> *fences = NULL;
> *num_fences = 0;
> dma_resv_iter_end(&cursor);
> return -ENOMEM;
> }
> *fences = new_fences;
> }
>
> (*fences)[(*num_fences)++] = dma_fence_get(fence);
> }
> ```
> The existing check 'if (count && !new_fences)' may fail if count==0,
> and 'krealloc_array' with count==0 is an undefined behavior. The
> realloc may fail and return a NULL pointer, leading to a NULL Pointer
> Dereference in '(*fences)[(*num_fences)++] = dma_fence_get(fence);'
You already answered the question yourself "count = cursor.num_fences +
1;". So count can never be 0.
What could theoretically be possible is that num_fences overflows, but
this value isn't userspace controllable and we would run into memory
allocation failures long before that happened.
But we could potentially remove this whole handling since if there are
no fences in the dma_resv object we don't enter the loop in the first place.
Regards,
Christian.
>
> Please correct us if we miss some key prerequisites for this function!
> Thank you very much!
On 27/09/2024 15:53, Adrián Larumbe wrote:
> On 25.09.2024 10:56, Steven Price wrote:
>> On 23/09/2024 21:43, Adrián Larumbe wrote:
>>> Hi Steve,
>>>
>>> On 23.09.2024 09:55, Steven Price wrote:
>>>> On 20/09/2024 23:36, Adrián Larumbe wrote:
>>>>> Hi Steve, thanks for the review.
>>>>
>>>> Hi Adrián,
>>>>
>>>>> I've applied all of your suggestions for the next patch series revision, so I'll
>>>>> only be answering to your question about the calc_profiling_ringbuf_num_slots
>>>>> function further down below.
>>>>>
>>>>
>>>> [...]
>>>>
>>>>>>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = {
>>>>>>> .free_job = queue_free_job,
>>>>>>> };
>>>>>>>
>>>>>>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
>>>>>>> + u32 cs_ringbuf_size)
>>>>>>> +{
>>>>>>> + u32 min_profiled_job_instrs = U32_MAX;
>>>>>>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * We want to calculate the minimum size of a profiled job's CS,
>>>>>>> + * because since they need additional instructions for the sampling
>>>>>>> + * of performance metrics, they might take up further slots in
>>>>>>> + * the queue's ringbuffer. This means we might not need as many job
>>>>>>> + * slots for keeping track of their profiling information. What we
>>>>>>> + * need is the maximum number of slots we should allocate to this end,
>>>>>>> + * which matches the maximum number of profiled jobs we can place
>>>>>>> + * simultaneously in the queue's ring buffer.
>>>>>>> + * That has to be calculated separately for every single job profiling
>>>>>>> + * flag, but not in the case job profiling is disabled, since unprofiled
>>>>>>> + * jobs don't need to keep track of this at all.
>>>>>>> + */
>>>>>>> + for (u32 i = 0; i < last_flag; i++) {
>>>>>>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)
>>>>>>> + min_profiled_job_instrs =
>>>>>>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
>>>>>>> + }
>>>>>>> +
>>>>>>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
>>>>>>> +}
>>>>>>
>>>>>> I may be missing something, but is there a situation where this is
>>>>>> different to calc_job_credits(0)? AFAICT the infrastructure you've added
>>>>>> can only add extra instructions to the no-flags case - whereas this
>>>>>> implies you're thinking that instructions may also be removed (or replaced).
>>>>>>
>>>>>> Steve
>>>>>
>>>>> Since we create a separate kernel BO to hold the profiling information slot, we
>>>>> need one that would be able to accomodate as many slots as the maximum number of
>>>>> profiled jobs we can insert simultaneously into the queue's ring buffer. Because
>>>>> profiled jobs always take more instructions than unprofiled ones, then we would
>>>>> usually need fewer slots than the number of unprofiled jobs we could insert at
>>>>> once in the ring buffer.
>>>>>
>>>>> Because we represent profiling metrics with a bit mask, then we need to test the
>>>>> size of the CS for every single metric enabled in isolation, since enabling more
>>>>> than one will always mean a bigger CS, and therefore fewer jobs tracked at once
>>>>> in the queue's ring buffer.
>>>>>
>>>>> In our case, calling calc_job_credits(0) would simply tell us the number of
>>>>> instructions we need for a normal job with no profiled features enabled, which
>>>>> would always requiere less instructions than profiled ones, and therefore more
>>>>> slots in the profiling info kernel BO. But we don't need to keep track of
>>>>> profiling numbers for unprofiled jobs, so there's no point in calculating this
>>>>> number.
>>>>>
>>>>> At first I was simply allocating a profiling info kernel BO as big as the number
>>>>> of simultaneous unprofiled job slots in the ring queue, but Boris pointed out
>>>>> that since queue ringbuffers can be as big as 2GiB, a lot of this memory would
>>>>> be wasted, since profiled jobs always require more slots because they hold more
>>>>> instructions, so fewer profiling slots in said kernel BO.
>>>>>
>>>>> The value of this approach will eventually manifest if we decided to keep track of
>>>>> more profiling metrics, since this code won't have to change at all, other than
>>>>> adding new profiling flags in the panthor_device_profiling_flags enum.
>>>>
>>>> Thanks for the detailed explanation. I think what I was missing is that
>>>> the loop is checking each bit flag independently and *not* checking
>>>> calc_job_credits(0).
>>>>
>>>> The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what
>>>> confused me - that should be completely redundant. Or at least we need
>>>> something more intelligent if we have profiling bits which are not
>>>> mutually compatible.
>>>
>>> I thought of an alternative that would only test bits that are actually part of
>>> the mask:
>>>
>>> static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
>>> u32 cs_ringbuf_size)
>>> {
>>> u32 min_profiled_job_instrs = U32_MAX;
>>> u32 profiling_mask = PANTHOR_DEVICE_PROFILING_ALL;
>>>
>>> while (profiling_mask) {
>>> u32 i = ffs(profiling_mask) - 1;
>>> profiling_mask &= ~BIT(i);
>>> min_profiled_job_instrs =
>>> min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
>>> }
>>>
>>> return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
>>> }
>>>
>>> However, I don't think this would be more efficient, because ffs() is probably
>>> fetching the first set bit by performing register shifts, and I guess this would
>>> take somewhat longer than iterating over every single bit from the last one,
>>> even if also matching them against the whole mask, just in case in future
>>> additions of performance metrics we decide to leave some of the lower
>>> significance bits untouched.
>>
>> Efficiency isn't very important here - we're not on a fast path, so it's
>> more about ensuring the code is readable. I don't think the above is
>> more readable then the original for loop.
>>
>>> Regarding your question about mutual compatibility, I don't think that is an
>>> issue here, because we're testing bits in isolation. If in the future we find
>>> out that some of the values we're profiling cannot be sampled at once, we can
>>> add that logic to the sysfs knob handler, to make sure UM cannot set forbidden
>>> profiling masks.
>>
>> My comment about compatibility is because in the original above you were
>> calculating the top bit of PANTHOR_DEVICE_PROFILING_ALL:
>>
>>> u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
>>
>> then looping between 0 and that bit:
>>
>>> for (u32 i = 0; i < last_flag; i++) {
>>
>> So the test:
>>
>>> if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)
>>
>> would only fail if PANTHOR_DEVICE_PROFILING_ALL had gaps in the bits
>> that it set. The only reason I can think for that to be true in the
>> future is if there is some sort of incompatibility - e.g. maybe there's
>> an old and new way of doing some form of profiling with the old way
>> being kept for backwards compatibility. But I suspect if/when that is
>> required we'll need to revisit this function anyway. So that 'if'
>> statement seems completely redundant (it's trivially always true).
>
> I think you're right about this. Would you be fine with the rest of the patch
> as it is in revision 8 if I also deleted this bitmask check?
Yes the rest of it looks fine.
Thanks,
Steve
>> Steve
>>
>>>> I'm also not entirely sure that the amount of RAM saved is significant,
>>>> but you've already written the code so we might as well have the saving ;)
>>>
>>> I think this was more evident before Boris suggested we reduce the basic slot
>>> size to that of a single cache line, because then the minimum profiled job
>>> might've taken twice as many ringbuffer slots as a nonprofiled one. In that
>>> case, we would need a half as big BO for holding the sampled data (in case the
>>> least size profiled job CS would extend over the 16 instruction boundary).
>>> I still think this is a good idea so that in the future we don't need to worry
>>> about adjusting the code that deals with preparing the right boilerplate CS,
>>> since it'll only be a matter of adding new instructions inside prepare_job_instrs().
>>>
>>>> Thanks,
>>>> Steve
>>>>
>>>>> Regards,
>>>>> Adrian
>>>>>
>>>>>>> +
>>>>>>> static struct panthor_queue *
>>>>>>> group_create_queue(struct panthor_group *group,
>>>>>>> const struct drm_panthor_queue_create *args)
>>>>>>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group,
>>>>>>> goto err_free_queue;
>>>>>>> }
>>>>>>>
>>>>>>> + queue->profiling.slot_count =
>>>>>>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size);
>>>>>>> +
>>>>>>> + queue->profiling.slots =
>>>>>>> + panthor_kernel_bo_create(group->ptdev, group->vm,
>>>>>>> + queue->profiling.slot_count *
>>>>>>> + sizeof(struct panthor_job_profiling_data),
>>>>>>> + DRM_PANTHOR_BO_NO_MMAP,
>>>>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC |
>>>>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED,
>>>>>>> + PANTHOR_VM_KERNEL_AUTO_VA);
>>>>>>> +
>>>>>>> + if (IS_ERR(queue->profiling.slots)) {
>>>>>>> + ret = PTR_ERR(queue->profiling.slots);
>>>>>>> + goto err_free_queue;
>>>>>>> + }
>>>>>>> +
>>>>>>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots);
>>>>>>> + if (ret)
>>>>>>> + goto err_free_queue;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Credit limit argument tells us the total number of instructions
>>>>>>> + * across all CS slots in the ringbuffer, with some jobs requiring
>>>>>>> + * twice as many as others, depending on their profiling status.
>>>>>>> + */
>>>>>>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops,
>>>>>>> group->ptdev->scheduler->wq, 1,
>>>>>>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)),
>>>>>>> + args->ringbuf_size / sizeof(u64),
>>>>>>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS),
>>>>>>> group->ptdev->reset.wq,
>>>>>>> NULL, "panthor-queue", group->ptdev->base.dev);
>>>>>>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile,
>>>>>>> {
>>>>>>> struct panthor_group_pool *gpool = pfile->groups;
>>>>>>> struct panthor_job *job;
>>>>>>> + u32 credits;
>>>>>>> int ret;
>>>>>>>
>>>>>>> if (qsubmit->pad)
>>>>>>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile,
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> + job->profiling.mask = pfile->ptdev->profile_mask;
>>>>>>> + credits = calc_job_credits(job->profiling.mask);
>>>>>>> + if (credits == 0) {
>>>>>>> + ret = -EINVAL;
>>>>>>> + goto err_put_job;
>>>>>>> + }
>>>>>>> +
>>>>>>> ret = drm_sched_job_init(&job->base,
>>>>>>> &job->group->queues[job->queue_idx]->entity,
>>>>>>> - 1, job->group);
>>>>>>> + credits, job->group);
>>>>>>> if (ret)
>>>>>>> goto err_put_job;
>>>>>>>
>>>>>
Hi,
This patch set is based on top of Yong Wu's restricted heap patch set [1].
It's also a continuation on Olivier's Add dma-buf secure-heap patch set [2].
The Linaro restricted heap uses genalloc in the kernel to manage the heap
carvout. This is a difference from the Mediatek restricted heap which
relies on the secure world to manage the carveout.
I've tried to adress the comments on [2], but [1] introduces changes so I'm
afraid I've had to skip some comments.
This can be tested on QEMU with the following steps:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
-b prototype/sdp-v1
repo sync -j8
cd build
make toolchains -j4
make all -j$(nproc)
make run-only
# login and at the prompt:
xtest --sdp-basic
https://optee.readthedocs.io/en/latest/building/prerequisites.html
list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in
the secure world can access and manipulate the memory.
Cheers,
Jens
[1] https://lore.kernel.org/dri-devel/20240515112308.10171-1-yong.wu@mediatek.c…
[2] https://lore.kernel.org/lkml/20220805135330.970-1-olivier.masse@nxp.com/
Changes since Olivier's post [2]:
* Based on Yong Wu's post [1] where much of dma-buf handling is done in
the generic restricted heap
* Simplifications and cleanup
* New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap
support"
* Replaced the word "secure" with "restricted" where applicable
Etienne Carriere (1):
tee: new ioctl to a register tee_shm from a dmabuf file descriptor
Jens Wiklander (2):
dma-buf: heaps: restricted_heap: add no_map attribute
dma-buf: heaps: add Linaro restricted dmabuf heap support
Olivier Masse (1):
dt-bindings: reserved-memory: add linaro,restricted-heap
.../linaro,restricted-heap.yaml | 56 ++++++
drivers/dma-buf/heaps/Kconfig | 10 ++
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/restricted_heap.c | 17 +-
drivers/dma-buf/heaps/restricted_heap.h | 2 +
.../dma-buf/heaps/restricted_heap_linaro.c | 165 ++++++++++++++++++
drivers/tee/tee_core.c | 38 ++++
drivers/tee/tee_shm.c | 104 ++++++++++-
include/linux/tee_drv.h | 11 ++
include/uapi/linux/tee.h | 29 +++
10 files changed, 426 insertions(+), 7 deletions(-)
create mode 100644 Documentation/devicetree/bindings/reserved-memory/linaro,restricted-heap.yaml
create mode 100644 drivers/dma-buf/heaps/restricted_heap_linaro.c
--
2.34.1
Hi Adrián,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.11 next-20240927]
[cannot apply to drm-misc/drm-misc-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i…
base: linus/master
patch link: https://lore.kernel.org/r/20240923230912.2207320-4-adrian.larumbe%40collabo…
patch subject: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support
config: arm-randconfig-002-20240929 (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@…)
compiler: arm-linux-gnueabi-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409291048.zLqDeqpO-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/linux/math64.h:6,
from include/linux/time.h:6,
from include/linux/stat.h:19,
from include/linux/module.h:13,
from drivers/gpu/drm/panthor/panthor_drv.c:7:
drivers/gpu/drm/panthor/panthor_drv.c: In function 'panthor_gpu_show_fdinfo':
>> drivers/gpu/drm/panthor/panthor_drv.c:1389:45: error: implicit declaration of function 'arch_timer_get_cntfrq' [-Wimplicit-function-declaration]
1389 | arch_timer_get_cntfrq()));
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/math.h:40:39: note: in definition of macro 'DIV_ROUND_DOWN_ULL'
40 | ({ unsigned long long _tmp = (ll); do_div(_tmp, d); _tmp; })
| ^~
drivers/gpu/drm/panthor/panthor_drv.c:1388:28: note: in expansion of macro 'DIV_ROUND_UP_ULL'
1388 | DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
| ^~~~~~~~~~~~~~~~
vim +/arch_timer_get_cntfrq +1389 drivers/gpu/drm/panthor/panthor_drv.c
1377
1378 static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev,
1379 struct panthor_file *pfile,
1380 struct drm_printer *p)
1381 {
1382 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_ALL)
1383 panthor_fdinfo_gather_group_samples(pfile);
1384
1385 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP) {
1386 #ifdef CONFIG_ARM_ARCH_TIMER
1387 drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
1388 DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
> 1389 arch_timer_get_cntfrq()));
1390 #endif
1391 }
1392 if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_CYCLES)
1393 drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles);
1394
1395 drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate);
1396 drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency);
1397 }
1398
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 23/09/2024 21:43, Adrián Larumbe wrote:
> Hi Steve,
>
> On 23.09.2024 09:55, Steven Price wrote:
>> On 20/09/2024 23:36, Adrián Larumbe wrote:
>>> Hi Steve, thanks for the review.
>>
>> Hi Adrián,
>>
>>> I've applied all of your suggestions for the next patch series revision, so I'll
>>> only be answering to your question about the calc_profiling_ringbuf_num_slots
>>> function further down below.
>>>
>>
>> [...]
>>
>>>>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = {
>>>>> .free_job = queue_free_job,
>>>>> };
>>>>>
>>>>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
>>>>> + u32 cs_ringbuf_size)
>>>>> +{
>>>>> + u32 min_profiled_job_instrs = U32_MAX;
>>>>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
>>>>> +
>>>>> + /*
>>>>> + * We want to calculate the minimum size of a profiled job's CS,
>>>>> + * because since they need additional instructions for the sampling
>>>>> + * of performance metrics, they might take up further slots in
>>>>> + * the queue's ringbuffer. This means we might not need as many job
>>>>> + * slots for keeping track of their profiling information. What we
>>>>> + * need is the maximum number of slots we should allocate to this end,
>>>>> + * which matches the maximum number of profiled jobs we can place
>>>>> + * simultaneously in the queue's ring buffer.
>>>>> + * That has to be calculated separately for every single job profiling
>>>>> + * flag, but not in the case job profiling is disabled, since unprofiled
>>>>> + * jobs don't need to keep track of this at all.
>>>>> + */
>>>>> + for (u32 i = 0; i < last_flag; i++) {
>>>>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)
>>>>> + min_profiled_job_instrs =
>>>>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
>>>>> + }
>>>>> +
>>>>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
>>>>> +}
>>>>
>>>> I may be missing something, but is there a situation where this is
>>>> different to calc_job_credits(0)? AFAICT the infrastructure you've added
>>>> can only add extra instructions to the no-flags case - whereas this
>>>> implies you're thinking that instructions may also be removed (or replaced).
>>>>
>>>> Steve
>>>
>>> Since we create a separate kernel BO to hold the profiling information slot, we
>>> need one that would be able to accomodate as many slots as the maximum number of
>>> profiled jobs we can insert simultaneously into the queue's ring buffer. Because
>>> profiled jobs always take more instructions than unprofiled ones, then we would
>>> usually need fewer slots than the number of unprofiled jobs we could insert at
>>> once in the ring buffer.
>>>
>>> Because we represent profiling metrics with a bit mask, then we need to test the
>>> size of the CS for every single metric enabled in isolation, since enabling more
>>> than one will always mean a bigger CS, and therefore fewer jobs tracked at once
>>> in the queue's ring buffer.
>>>
>>> In our case, calling calc_job_credits(0) would simply tell us the number of
>>> instructions we need for a normal job with no profiled features enabled, which
>>> would always requiere less instructions than profiled ones, and therefore more
>>> slots in the profiling info kernel BO. But we don't need to keep track of
>>> profiling numbers for unprofiled jobs, so there's no point in calculating this
>>> number.
>>>
>>> At first I was simply allocating a profiling info kernel BO as big as the number
>>> of simultaneous unprofiled job slots in the ring queue, but Boris pointed out
>>> that since queue ringbuffers can be as big as 2GiB, a lot of this memory would
>>> be wasted, since profiled jobs always require more slots because they hold more
>>> instructions, so fewer profiling slots in said kernel BO.
>>>
>>> The value of this approach will eventually manifest if we decided to keep track of
>>> more profiling metrics, since this code won't have to change at all, other than
>>> adding new profiling flags in the panthor_device_profiling_flags enum.
>>
>> Thanks for the detailed explanation. I think what I was missing is that
>> the loop is checking each bit flag independently and *not* checking
>> calc_job_credits(0).
>>
>> The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what
>> confused me - that should be completely redundant. Or at least we need
>> something more intelligent if we have profiling bits which are not
>> mutually compatible.
>
> I thought of an alternative that would only test bits that are actually part of
> the mask:
>
> static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
> u32 cs_ringbuf_size)
> {
> u32 min_profiled_job_instrs = U32_MAX;
> u32 profiling_mask = PANTHOR_DEVICE_PROFILING_ALL;
>
> while (profiling_mask) {
> u32 i = ffs(profiling_mask) - 1;
> profiling_mask &= ~BIT(i);
> min_profiled_job_instrs =
> min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
> }
>
> return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
> }
>
> However, I don't think this would be more efficient, because ffs() is probably
> fetching the first set bit by performing register shifts, and I guess this would
> take somewhat longer than iterating over every single bit from the last one,
> even if also matching them against the whole mask, just in case in future
> additions of performance metrics we decide to leave some of the lower
> significance bits untouched.
Efficiency isn't very important here - we're not on a fast path, so it's
more about ensuring the code is readable. I don't think the above is
more readable then the original for loop.
> Regarding your question about mutual compatibility, I don't think that is an
> issue here, because we're testing bits in isolation. If in the future we find
> out that some of the values we're profiling cannot be sampled at once, we can
> add that logic to the sysfs knob handler, to make sure UM cannot set forbidden
> profiling masks.
My comment about compatibility is because in the original above you were
calculating the top bit of PANTHOR_DEVICE_PROFILING_ALL:
> u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
then looping between 0 and that bit:
> for (u32 i = 0; i < last_flag; i++) {
So the test:
> if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)
would only fail if PANTHOR_DEVICE_PROFILING_ALL had gaps in the bits
that it set. The only reason I can think for that to be true in the
future is if there is some sort of incompatibility - e.g. maybe there's
an old and new way of doing some form of profiling with the old way
being kept for backwards compatibility. But I suspect if/when that is
required we'll need to revisit this function anyway. So that 'if'
statement seems completely redundant (it's trivially always true).
Steve
>> I'm also not entirely sure that the amount of RAM saved is significant,
>> but you've already written the code so we might as well have the saving ;)
>
> I think this was more evident before Boris suggested we reduce the basic slot
> size to that of a single cache line, because then the minimum profiled job
> might've taken twice as many ringbuffer slots as a nonprofiled one. In that
> case, we would need a half as big BO for holding the sampled data (in case the
> least size profiled job CS would extend over the 16 instruction boundary).
> I still think this is a good idea so that in the future we don't need to worry
> about adjusting the code that deals with preparing the right boilerplate CS,
> since it'll only be a matter of adding new instructions inside prepare_job_instrs().
>
>> Thanks,
>> Steve
>>
>>> Regards,
>>> Adrian
>>>
>>>>> +
>>>>> static struct panthor_queue *
>>>>> group_create_queue(struct panthor_group *group,
>>>>> const struct drm_panthor_queue_create *args)
>>>>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group,
>>>>> goto err_free_queue;
>>>>> }
>>>>>
>>>>> + queue->profiling.slot_count =
>>>>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size);
>>>>> +
>>>>> + queue->profiling.slots =
>>>>> + panthor_kernel_bo_create(group->ptdev, group->vm,
>>>>> + queue->profiling.slot_count *
>>>>> + sizeof(struct panthor_job_profiling_data),
>>>>> + DRM_PANTHOR_BO_NO_MMAP,
>>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC |
>>>>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED,
>>>>> + PANTHOR_VM_KERNEL_AUTO_VA);
>>>>> +
>>>>> + if (IS_ERR(queue->profiling.slots)) {
>>>>> + ret = PTR_ERR(queue->profiling.slots);
>>>>> + goto err_free_queue;
>>>>> + }
>>>>> +
>>>>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots);
>>>>> + if (ret)
>>>>> + goto err_free_queue;
>>>>> +
>>>>> + /*
>>>>> + * Credit limit argument tells us the total number of instructions
>>>>> + * across all CS slots in the ringbuffer, with some jobs requiring
>>>>> + * twice as many as others, depending on their profiling status.
>>>>> + */
>>>>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops,
>>>>> group->ptdev->scheduler->wq, 1,
>>>>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)),
>>>>> + args->ringbuf_size / sizeof(u64),
>>>>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS),
>>>>> group->ptdev->reset.wq,
>>>>> NULL, "panthor-queue", group->ptdev->base.dev);
>>>>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile,
>>>>> {
>>>>> struct panthor_group_pool *gpool = pfile->groups;
>>>>> struct panthor_job *job;
>>>>> + u32 credits;
>>>>> int ret;
>>>>>
>>>>> if (qsubmit->pad)
>>>>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile,
>>>>> }
>>>>> }
>>>>>
>>>>> + job->profiling.mask = pfile->ptdev->profile_mask;
>>>>> + credits = calc_job_credits(job->profiling.mask);
>>>>> + if (credits == 0) {
>>>>> + ret = -EINVAL;
>>>>> + goto err_put_job;
>>>>> + }
>>>>> +
>>>>> ret = drm_sched_job_init(&job->base,
>>>>> &job->group->queues[job->queue_idx]->entity,
>>>>> - 1, job->group);
>>>>> + credits, job->group);
>>>>> if (ret)
>>>>> goto err_put_job;
>>>>>
>>>
>
>
> Adrian Larumbe
Nothing wrong with this, I just didn't had time to double check it
myself and then forgotten about it.
Going to push it to drm-misc-next.
Regards,
Christian.
Am 23.09.24 um 11:22 schrieb Tommy Chiang:
> Ping.
> Please let me know if I'm doing something wrong.
>
> On Mon, Feb 19, 2024 at 11:00 AM Tommy Chiang <ototot(a)chromium.org> wrote:
>> Kindly ping :)
>>
>> On Fri, Jan 19, 2024 at 11:33 AM Tommy Chiang <ototot(a)chromium.org> wrote:
>>> This patch tries to improve the display of the code listing
>>> on The Linux Kernel documentation website for dma-buf [1] .
>>>
>>> Originally, it appears that it was attempting to escape
>>> the '*' character, but looks like it's not necessary (now),
>>> so we are seeing something like '\*' on the webite.
>>>
>>> This patch removes these unnecessary backslashes and adds syntax
>>> highlighting to improve the readability of the code listing.
>>>
>>> [1] https://docs.kernel.org/driver-api/dma-buf.html
>>>
>>> Signed-off-by: Tommy Chiang <ototot(a)chromium.org>
>>> ---
>>> drivers/dma-buf/dma-buf.c | 15 +++++++++------
>>> 1 file changed, 9 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>> index 8fe5aa67b167..e083a0ab06d7 100644
>>> --- a/drivers/dma-buf/dma-buf.c
>>> +++ b/drivers/dma-buf/dma-buf.c
>>> @@ -1282,10 +1282,12 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF);
>>> * vmap interface is introduced. Note that on very old 32-bit architectures
>>> * vmalloc space might be limited and result in vmap calls failing.
>>> *
>>> - * Interfaces::
>>> + * Interfaces:
>>> *
>>> - * void \*dma_buf_vmap(struct dma_buf \*dmabuf, struct iosys_map \*map)
>>> - * void dma_buf_vunmap(struct dma_buf \*dmabuf, struct iosys_map \*map)
>>> + * .. code-block:: c
>>> + *
>>> + * void *dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map)
>>> + * void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map)
>>> *
>>> * The vmap call can fail if there is no vmap support in the exporter, or if
>>> * it runs out of vmalloc space. Note that the dma-buf layer keeps a reference
>>> @@ -1342,10 +1344,11 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, DMA_BUF);
>>> * enough, since adding interfaces to intercept pagefaults and allow pte
>>> * shootdowns would increase the complexity quite a bit.
>>> *
>>> - * Interface::
>>> + * Interface:
>>> + *
>>> + * .. code-block:: c
>>> *
>>> - * int dma_buf_mmap(struct dma_buf \*, struct vm_area_struct \*,
>>> - * unsigned long);
>>> + * int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long);
>>> *
>>> * If the importing subsystem simply provides a special-purpose mmap call to
>>> * set up a mapping in userspace, calling do_mmap with &dma_buf.file will
>>> --
>>> 2.43.0.381.gb435a96ce8-goog
>>>
On 20/09/2024 23:36, Adrián Larumbe wrote:
> Hi Steve, thanks for the review.
Hi Adrián,
> I've applied all of your suggestions for the next patch series revision, so I'll
> only be answering to your question about the calc_profiling_ringbuf_num_slots
> function further down below.
>
[...]
>>> @@ -3003,6 +3190,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = {
>>> .free_job = queue_free_job,
>>> };
>>>
>>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
>>> + u32 cs_ringbuf_size)
>>> +{
>>> + u32 min_profiled_job_instrs = U32_MAX;
>>> + u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
>>> +
>>> + /*
>>> + * We want to calculate the minimum size of a profiled job's CS,
>>> + * because since they need additional instructions for the sampling
>>> + * of performance metrics, they might take up further slots in
>>> + * the queue's ringbuffer. This means we might not need as many job
>>> + * slots for keeping track of their profiling information. What we
>>> + * need is the maximum number of slots we should allocate to this end,
>>> + * which matches the maximum number of profiled jobs we can place
>>> + * simultaneously in the queue's ring buffer.
>>> + * That has to be calculated separately for every single job profiling
>>> + * flag, but not in the case job profiling is disabled, since unprofiled
>>> + * jobs don't need to keep track of this at all.
>>> + */
>>> + for (u32 i = 0; i < last_flag; i++) {
>>> + if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)
>>> + min_profiled_job_instrs =
>>> + min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
>>> + }
>>> +
>>> + return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
>>> +}
>>
>> I may be missing something, but is there a situation where this is
>> different to calc_job_credits(0)? AFAICT the infrastructure you've added
>> can only add extra instructions to the no-flags case - whereas this
>> implies you're thinking that instructions may also be removed (or replaced).
>>
>> Steve
>
> Since we create a separate kernel BO to hold the profiling information slot, we
> need one that would be able to accomodate as many slots as the maximum number of
> profiled jobs we can insert simultaneously into the queue's ring buffer. Because
> profiled jobs always take more instructions than unprofiled ones, then we would
> usually need fewer slots than the number of unprofiled jobs we could insert at
> once in the ring buffer.
>
> Because we represent profiling metrics with a bit mask, then we need to test the
> size of the CS for every single metric enabled in isolation, since enabling more
> than one will always mean a bigger CS, and therefore fewer jobs tracked at once
> in the queue's ring buffer.
>
> In our case, calling calc_job_credits(0) would simply tell us the number of
> instructions we need for a normal job with no profiled features enabled, which
> would always requiere less instructions than profiled ones, and therefore more
> slots in the profiling info kernel BO. But we don't need to keep track of
> profiling numbers for unprofiled jobs, so there's no point in calculating this
> number.
>
> At first I was simply allocating a profiling info kernel BO as big as the number
> of simultaneous unprofiled job slots in the ring queue, but Boris pointed out
> that since queue ringbuffers can be as big as 2GiB, a lot of this memory would
> be wasted, since profiled jobs always require more slots because they hold more
> instructions, so fewer profiling slots in said kernel BO.
>
> The value of this approach will eventually manifest if we decided to keep track of
> more profiling metrics, since this code won't have to change at all, other than
> adding new profiling flags in the panthor_device_profiling_flags enum.
Thanks for the detailed explanation. I think what I was missing is that
the loop is checking each bit flag independently and *not* checking
calc_job_credits(0).
The check for (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL) is probably what
confused me - that should be completely redundant. Or at least we need
something more intelligent if we have profiling bits which are not
mutually compatible.
I'm also not entirely sure that the amount of RAM saved is significant,
but you've already written the code so we might as well have the saving ;)
Thanks,
Steve
> Regards,
> Adrian
>
>>> +
>>> static struct panthor_queue *
>>> group_create_queue(struct panthor_group *group,
>>> const struct drm_panthor_queue_create *args)
>>> @@ -3056,9 +3271,35 @@ group_create_queue(struct panthor_group *group,
>>> goto err_free_queue;
>>> }
>>>
>>> + queue->profiling.slot_count =
>>> + calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size);
>>> +
>>> + queue->profiling.slots =
>>> + panthor_kernel_bo_create(group->ptdev, group->vm,
>>> + queue->profiling.slot_count *
>>> + sizeof(struct panthor_job_profiling_data),
>>> + DRM_PANTHOR_BO_NO_MMAP,
>>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC |
>>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED,
>>> + PANTHOR_VM_KERNEL_AUTO_VA);
>>> +
>>> + if (IS_ERR(queue->profiling.slots)) {
>>> + ret = PTR_ERR(queue->profiling.slots);
>>> + goto err_free_queue;
>>> + }
>>> +
>>> + ret = panthor_kernel_bo_vmap(queue->profiling.slots);
>>> + if (ret)
>>> + goto err_free_queue;
>>> +
>>> + /*
>>> + * Credit limit argument tells us the total number of instructions
>>> + * across all CS slots in the ringbuffer, with some jobs requiring
>>> + * twice as many as others, depending on their profiling status.
>>> + */
>>> ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops,
>>> group->ptdev->scheduler->wq, 1,
>>> - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)),
>>> + args->ringbuf_size / sizeof(u64),
>>> 0, msecs_to_jiffies(JOB_TIMEOUT_MS),
>>> group->ptdev->reset.wq,
>>> NULL, "panthor-queue", group->ptdev->base.dev);
>>> @@ -3354,6 +3595,7 @@ panthor_job_create(struct panthor_file *pfile,
>>> {
>>> struct panthor_group_pool *gpool = pfile->groups;
>>> struct panthor_job *job;
>>> + u32 credits;
>>> int ret;
>>>
>>> if (qsubmit->pad)
>>> @@ -3407,9 +3649,16 @@ panthor_job_create(struct panthor_file *pfile,
>>> }
>>> }
>>>
>>> + job->profiling.mask = pfile->ptdev->profile_mask;
>>> + credits = calc_job_credits(job->profiling.mask);
>>> + if (credits == 0) {
>>> + ret = -EINVAL;
>>> + goto err_put_job;
>>> + }
>>> +
>>> ret = drm_sched_job_init(&job->base,
>>> &job->group->queues[job->queue_idx]->entity,
>>> - 1, job->group);
>>> + credits, job->group);
>>> if (ret)
>>> goto err_put_job;
>>>
>
Condsider the following call sequence:
/* Upper layer */
dma_fence_begin_signalling();
lock(tainted_shared_lock);
/* Driver callback */
dma_fence_begin_signalling();
...
The driver might here use a utility that is annotated as intended for the
dma-fence signalling critical path. Now if the upper layer isn't correctly
annotated yet for whatever reason, resulting in
/* Upper layer */
lock(tainted_shared_lock);
/* Driver callback */
dma_fence_begin_signalling();
We will receive a false lockdep locking order violation notification from
dma_fence_begin_signalling(). However entering a dma-fence signalling
critical section itself doesn't block and could not cause a deadlock.
So use a successful read_trylock() annotation instead for
dma_fence_begin_signalling(). That will make sure that the locking order
is correctly registered in the first case, and doesn't register any
locking order in the second case.
The alternative is of course to make sure that the "Upper layer" is always
correctly annotated. But experience shows that's not easily achievable
in all cases.
Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
---
drivers/dma-buf/dma-fence.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index f177c56269bb..17f632768ef9 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -308,8 +308,8 @@ bool dma_fence_begin_signalling(void)
if (in_atomic())
return true;
- /* ... and non-recursive readlock */
- lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_);
+ /* ... and non-recursive successful read_trylock */
+ lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _RET_IP_);
return false;
}
@@ -340,7 +340,7 @@ void __dma_fence_might_wait(void)
lock_map_acquire(&dma_fence_lockdep_map);
lock_map_release(&dma_fence_lockdep_map);
if (tmp)
- lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_);
+ lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _THIS_IP_);
}
#endif
--
2.39.2
Hi Adrián,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.11-rc7 next-20240913]
[cannot apply to drm-misc/drm-misc-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i…
base: linus/master
patch link: https://lore.kernel.org/r/20240913124857.389630-2-adrian.larumbe%40collabor…
patch subject: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting
config: i386-buildonly-randconfig-003-20240915 (https://download.01.org/0day-ci/archive/20240915/202409152243.r3t2jdOJ-lkp@…)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240915/202409152243.r3t2jdOJ-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409152243.r3t2jdOJ-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/gpu/drm/panthor/panthor_sched.c:2885:12: error: call to '__compiletime_assert_371' declared with 'error' attribute: min(ringbuf_size - start, size) signedness error
2885 | written = min(ringbuf_size - start, size);
| ^
include/linux/minmax.h:129:19: note: expanded from macro 'min'
129 | #define min(x, y) __careful_cmp(min, x, y)
| ^
include/linux/minmax.h:105:2: note: expanded from macro '__careful_cmp'
105 | __careful_cmp_once(op, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_))
| ^
include/linux/minmax.h:100:2: note: expanded from macro '__careful_cmp_once'
100 | BUILD_BUG_ON_MSG(!__types_ok(x,y,ux,uy), \
| ^
note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
include/linux/compiler_types.h:498:2: note: expanded from macro '_compiletime_assert'
498 | __compiletime_assert(condition, msg, prefix, suffix)
| ^
include/linux/compiler_types.h:491:4: note: expanded from macro '__compiletime_assert'
491 | prefix ## suffix(); \
| ^
<scratch space>:68:1: note: expanded from here
68 | __compiletime_assert_371
| ^
1 error generated.
vim +2885 drivers/gpu/drm/panthor/panthor_sched.c
2862
2863 #define JOB_INSTR(__prof, __instr) \
2864 { \
2865 .profile_mask = __prof, \
2866 .instr = __instr, \
2867 }
2868
2869 static void
2870 copy_instrs_to_ringbuf(struct panthor_queue *queue,
2871 struct panthor_job *job,
2872 struct panthor_job_ringbuf_instrs *instrs)
2873 {
2874 ssize_t ringbuf_size = panthor_kernel_bo_size(queue->ringbuf);
2875 u32 start = job->ringbuf.start & (ringbuf_size - 1);
2876 ssize_t size, written;
2877
2878 /*
2879 * We need to write a whole slot, including any trailing zeroes
2880 * that may come at the end of it. Also, because instrs.buffer has
2881 * been zero-initialised, there's no need to pad it with 0's
2882 */
2883 instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE);
2884 size = instrs->count * sizeof(u64);
> 2885 written = min(ringbuf_size - start, size);
2886
2887 memcpy(queue->ringbuf->kmap + start, instrs->buffer, written);
2888
2889 if (written < size)
2890 memcpy(queue->ringbuf->kmap,
2891 &instrs->buffer[(ringbuf_size - start)/sizeof(u64)],
2892 size - written);
2893 }
2894
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Adrián,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.11-rc7 next-20240913]
[cannot apply to drm-misc/drm-misc-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i…
base: linus/master
patch link: https://lore.kernel.org/r/20240913124857.389630-2-adrian.larumbe%40collabor…
patch subject: [PATCH v6 1/5] drm/panthor: introduce job cycle and timestamp accounting
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20240914/202409140506.OBoqSiVk-lkp@…)
compiler: alpha-linux-gcc (GCC) 13.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240914/202409140506.OBoqSiVk-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409140506.OBoqSiVk-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'runnable' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'idle' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'waiting' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'has_ref' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'in_progress' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'stopped_groups' description in 'panthor_scheduler'
>> drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Function parameter or struct member 'profiling' not described in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'mem' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input_fw_va' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output_fw_va' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gpu_va' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'ref' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gt' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'sync64' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'bo' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'offset' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'kmap' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'lock' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'id' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'seqno' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'last_fence' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'in_flight_jobs' description in 'panthor_queue'
>> drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'profiling_info' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slots' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slot_count' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'profiling_seqno' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'start' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'size' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'latest_flush' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'start' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'end' description in 'panthor_job'
>> drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'mask' description in 'panthor_job'
>> drivers/gpu/drm/panthor/panthor_sched.c:813: warning: Excess struct member 'slot' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:1734: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_fw_events'
drivers/gpu/drm/panthor/panthor_sched.c:1734: warning: Function parameter or struct member 'events' not described in 'panthor_sched_report_fw_events'
drivers/gpu/drm/panthor/panthor_sched.c:2626: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_mmu_fault'
vim +494 drivers/gpu/drm/panthor/panthor_sched.c
de85488138247d Boris Brezillon 2024-02-29 397
de85488138247d Boris Brezillon 2024-02-29 398 /** @ringbuf: Command stream ring-buffer. */
de85488138247d Boris Brezillon 2024-02-29 399 struct panthor_kernel_bo *ringbuf;
de85488138247d Boris Brezillon 2024-02-29 400
de85488138247d Boris Brezillon 2024-02-29 401 /** @iface: Firmware interface. */
de85488138247d Boris Brezillon 2024-02-29 402 struct {
de85488138247d Boris Brezillon 2024-02-29 403 /** @mem: FW memory allocated for this interface. */
de85488138247d Boris Brezillon 2024-02-29 404 struct panthor_kernel_bo *mem;
de85488138247d Boris Brezillon 2024-02-29 405
de85488138247d Boris Brezillon 2024-02-29 406 /** @input: Input interface. */
de85488138247d Boris Brezillon 2024-02-29 407 struct panthor_fw_ringbuf_input_iface *input;
de85488138247d Boris Brezillon 2024-02-29 408
de85488138247d Boris Brezillon 2024-02-29 409 /** @output: Output interface. */
de85488138247d Boris Brezillon 2024-02-29 410 const struct panthor_fw_ringbuf_output_iface *output;
de85488138247d Boris Brezillon 2024-02-29 411
de85488138247d Boris Brezillon 2024-02-29 412 /** @input_fw_va: FW virtual address of the input interface buffer. */
de85488138247d Boris Brezillon 2024-02-29 413 u32 input_fw_va;
de85488138247d Boris Brezillon 2024-02-29 414
de85488138247d Boris Brezillon 2024-02-29 415 /** @output_fw_va: FW virtual address of the output interface buffer. */
de85488138247d Boris Brezillon 2024-02-29 416 u32 output_fw_va;
de85488138247d Boris Brezillon 2024-02-29 417 } iface;
de85488138247d Boris Brezillon 2024-02-29 418
de85488138247d Boris Brezillon 2024-02-29 419 /**
de85488138247d Boris Brezillon 2024-02-29 420 * @syncwait: Stores information about the synchronization object this
de85488138247d Boris Brezillon 2024-02-29 421 * queue is waiting on.
de85488138247d Boris Brezillon 2024-02-29 422 */
de85488138247d Boris Brezillon 2024-02-29 423 struct {
de85488138247d Boris Brezillon 2024-02-29 424 /** @gpu_va: GPU address of the synchronization object. */
de85488138247d Boris Brezillon 2024-02-29 425 u64 gpu_va;
de85488138247d Boris Brezillon 2024-02-29 426
de85488138247d Boris Brezillon 2024-02-29 427 /** @ref: Reference value to compare against. */
de85488138247d Boris Brezillon 2024-02-29 428 u64 ref;
de85488138247d Boris Brezillon 2024-02-29 429
de85488138247d Boris Brezillon 2024-02-29 430 /** @gt: True if this is a greater-than test. */
de85488138247d Boris Brezillon 2024-02-29 431 bool gt;
de85488138247d Boris Brezillon 2024-02-29 432
de85488138247d Boris Brezillon 2024-02-29 433 /** @sync64: True if this is a 64-bit sync object. */
de85488138247d Boris Brezillon 2024-02-29 434 bool sync64;
de85488138247d Boris Brezillon 2024-02-29 435
de85488138247d Boris Brezillon 2024-02-29 436 /** @bo: Buffer object holding the synchronization object. */
de85488138247d Boris Brezillon 2024-02-29 437 struct drm_gem_object *obj;
de85488138247d Boris Brezillon 2024-02-29 438
de85488138247d Boris Brezillon 2024-02-29 439 /** @offset: Offset of the synchronization object inside @bo. */
de85488138247d Boris Brezillon 2024-02-29 440 u64 offset;
de85488138247d Boris Brezillon 2024-02-29 441
de85488138247d Boris Brezillon 2024-02-29 442 /**
de85488138247d Boris Brezillon 2024-02-29 443 * @kmap: Kernel mapping of the buffer object holding the
de85488138247d Boris Brezillon 2024-02-29 444 * synchronization object.
de85488138247d Boris Brezillon 2024-02-29 445 */
de85488138247d Boris Brezillon 2024-02-29 446 void *kmap;
de85488138247d Boris Brezillon 2024-02-29 447 } syncwait;
de85488138247d Boris Brezillon 2024-02-29 448
de85488138247d Boris Brezillon 2024-02-29 449 /** @fence_ctx: Fence context fields. */
de85488138247d Boris Brezillon 2024-02-29 450 struct {
de85488138247d Boris Brezillon 2024-02-29 451 /** @lock: Used to protect access to all fences allocated by this context. */
de85488138247d Boris Brezillon 2024-02-29 452 spinlock_t lock;
de85488138247d Boris Brezillon 2024-02-29 453
de85488138247d Boris Brezillon 2024-02-29 454 /**
de85488138247d Boris Brezillon 2024-02-29 455 * @id: Fence context ID.
de85488138247d Boris Brezillon 2024-02-29 456 *
de85488138247d Boris Brezillon 2024-02-29 457 * Allocated with dma_fence_context_alloc().
de85488138247d Boris Brezillon 2024-02-29 458 */
de85488138247d Boris Brezillon 2024-02-29 459 u64 id;
de85488138247d Boris Brezillon 2024-02-29 460
de85488138247d Boris Brezillon 2024-02-29 461 /** @seqno: Sequence number of the last initialized fence. */
de85488138247d Boris Brezillon 2024-02-29 462 atomic64_t seqno;
de85488138247d Boris Brezillon 2024-02-29 463
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 464 /**
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 465 * @last_fence: Fence of the last submitted job.
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 466 *
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 467 * We return this fence when we get an empty command stream.
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 468 * This way, we are guaranteed that all earlier jobs have completed
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 469 * when drm_sched_job::s_fence::finished without having to feed
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 470 * the CS ring buffer with a dummy job that only signals the fence.
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 471 */
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 472 struct dma_fence *last_fence;
7b6f9ec6ad5112 Boris Brezillon 2024-07-03 473
de85488138247d Boris Brezillon 2024-02-29 474 /**
de85488138247d Boris Brezillon 2024-02-29 475 * @in_flight_jobs: List containing all in-flight jobs.
de85488138247d Boris Brezillon 2024-02-29 476 *
de85488138247d Boris Brezillon 2024-02-29 477 * Used to keep track and signal panthor_job::done_fence when the
de85488138247d Boris Brezillon 2024-02-29 478 * synchronization object attached to the queue is signaled.
de85488138247d Boris Brezillon 2024-02-29 479 */
de85488138247d Boris Brezillon 2024-02-29 480 struct list_head in_flight_jobs;
de85488138247d Boris Brezillon 2024-02-29 481 } fence_ctx;
a706810cebb072 Adrián Larumbe 2024-09-13 482
a706810cebb072 Adrián Larumbe 2024-09-13 483 /** @profiling_info: Job profiling data slots and access information. */
a706810cebb072 Adrián Larumbe 2024-09-13 484 struct {
a706810cebb072 Adrián Larumbe 2024-09-13 485 /** @slots: Kernel BO holding the slots. */
a706810cebb072 Adrián Larumbe 2024-09-13 486 struct panthor_kernel_bo *slots;
a706810cebb072 Adrián Larumbe 2024-09-13 487
a706810cebb072 Adrián Larumbe 2024-09-13 488 /** @slot_count: Number of jobs ringbuffer can hold at once. */
a706810cebb072 Adrián Larumbe 2024-09-13 489 u32 slot_count;
a706810cebb072 Adrián Larumbe 2024-09-13 490
a706810cebb072 Adrián Larumbe 2024-09-13 491 /** @profiling_seqno: Index of the next available profiling information slot. */
a706810cebb072 Adrián Larumbe 2024-09-13 492 u32 seqno;
a706810cebb072 Adrián Larumbe 2024-09-13 493 } profiling;
de85488138247d Boris Brezillon 2024-02-29 @494 };
de85488138247d Boris Brezillon 2024-02-29 495
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Mon, 22 Jul 2024 08:53:29 +0200, Alexandre Mergnat wrote:
> This serie aim to add the following audio support for the Genio 350-evk:
> - Playback
> - 2ch Headset Jack (Earphone)
> - 1ch Line-out Jack (Speaker)
> - 8ch HDMI Tx
> - Capture
> - 1ch DMIC (On-board Digital Microphone)
> - 1ch AMIC (On-board Analogic Microphone)
> - 1ch Headset Jack (External Analogic Microphone)
>
> [...]
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next
Thanks!
[01/16] ASoC: dt-bindings: mediatek,mt8365-afe: Add audio afe document
commit: ceb3ca2876243e3ea02f78b3d488b1f2d734de49
[02/16] ASoC: dt-bindings: mediatek,mt8365-mt6357: Add audio sound card document
commit: 76d80dcdd55f70b28930edb97b96ee375e1cce5a
[03/16] dt-bindings: mfd: mediatek: Add codec property for MT6357 PMIC
commit: 761cab667898d86c04867948f1b7aec1090be796
[04/16] ASoC: mediatek: mt8365: Add common header
commit: 38c7c9ddc74033406461d64e541bbc8268e77f73
[05/16] ASoC: mediatek: mt8365: Add audio clock control support
commit: ef307b40b7f0042d54f020bccb3e728ced292282
[06/16] ASoC: mediatek: mt8365: Add I2S DAI support
commit: 402bbb13a195caa83b3279ebecdabfb11ddee084
[07/16] ASoC: mediatek: mt8365: Add ADDA DAI support
commit: 7c58c88e524180e8439acdfc44872325e7f6d33d
[08/16] ASoC: mediatek: mt8365: Add DMIC DAI support
commit: 1c50ec75ce6c0c6b5736499393e522f73e19d0cf
[09/16] ASoC: mediatek: mt8365: Add PCM DAI support
commit: 5097c0c8634d703e3c59cfb89831b7db9dc46339
[10/16] ASoc: mediatek: mt8365: Add a specific soundcard for EVK
commit: 1bf6dbd75f7603dd026660bebf324f812200dc1b
[11/16] ASoC: mediatek: mt8365: Add the AFE driver support
commit: e1991d102bc2abb32331c462f8f3e77059c69578
[12/16] ASoC: codecs: add MT6357 support
(no commit info)
[13/16] ASoC: mediatek: Add MT8365 support
(no commit info)
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
Hi Adrián,
kernel test robot noticed the following build warnings:
[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.11-rc6 next-20240904]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i…
base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link: https://lore.kernel.org/r/20240903202541.430225-3-adrian.larumbe%40collabor…
patch subject: [PATCH v5 2/4] drm/panthor: add DRM fdinfo support
config: x86_64-buildonly-randconfig-002-20240904 (https://download.01.org/0day-ci/archive/20240905/202409050134.uxrIkhqc-lkp@…)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240905/202409050134.uxrIkhqc-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409050134.uxrIkhqc-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'runnable' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'idle' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'waiting' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'has_ref' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'in_progress' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:322: warning: Excess struct member 'stopped_groups' description in 'panthor_scheduler'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'mem' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'input_fw_va' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'output_fw_va' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gpu_va' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'ref' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'gt' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'sync64' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'bo' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'offset' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'kmap' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'lock' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'id' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'seqno' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'last_fence' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'in_flight_jobs' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slots' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'slot_count' description in 'panthor_queue'
drivers/gpu/drm/panthor/panthor_sched.c:494: warning: Excess struct member 'profiling_seqno' description in 'panthor_queue'
>> drivers/gpu/drm/panthor/panthor_sched.c:689: warning: Excess struct member 'data' description in 'panthor_group'
>> drivers/gpu/drm/panthor/panthor_sched.c:689: warning: Excess struct member 'lock' description in 'panthor_group'
drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Function parameter or struct member 'profiling_slot' not described in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'start' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'size' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'latest_flush' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'start' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'end' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:822: warning: Excess struct member 'profile_slot' description in 'panthor_job'
drivers/gpu/drm/panthor/panthor_sched.c:1745: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_fw_events'
drivers/gpu/drm/panthor/panthor_sched.c:1745: warning: Function parameter or struct member 'events' not described in 'panthor_sched_report_fw_events'
drivers/gpu/drm/panthor/panthor_sched.c:2637: warning: Function parameter or struct member 'ptdev' not described in 'panthor_sched_report_mmu_fault'
vim +689 drivers/gpu/drm/panthor/panthor_sched.c
de85488138247d0 Boris Brezillon 2024-02-29 531
de85488138247d0 Boris Brezillon 2024-02-29 532 /**
de85488138247d0 Boris Brezillon 2024-02-29 533 * struct panthor_group - Scheduling group object
de85488138247d0 Boris Brezillon 2024-02-29 534 */
de85488138247d0 Boris Brezillon 2024-02-29 535 struct panthor_group {
de85488138247d0 Boris Brezillon 2024-02-29 536 /** @refcount: Reference count */
de85488138247d0 Boris Brezillon 2024-02-29 537 struct kref refcount;
de85488138247d0 Boris Brezillon 2024-02-29 538
de85488138247d0 Boris Brezillon 2024-02-29 539 /** @ptdev: Device. */
de85488138247d0 Boris Brezillon 2024-02-29 540 struct panthor_device *ptdev;
de85488138247d0 Boris Brezillon 2024-02-29 541
de85488138247d0 Boris Brezillon 2024-02-29 542 /** @vm: VM bound to the group. */
de85488138247d0 Boris Brezillon 2024-02-29 543 struct panthor_vm *vm;
de85488138247d0 Boris Brezillon 2024-02-29 544
de85488138247d0 Boris Brezillon 2024-02-29 545 /** @compute_core_mask: Mask of shader cores that can be used for compute jobs. */
de85488138247d0 Boris Brezillon 2024-02-29 546 u64 compute_core_mask;
de85488138247d0 Boris Brezillon 2024-02-29 547
de85488138247d0 Boris Brezillon 2024-02-29 548 /** @fragment_core_mask: Mask of shader cores that can be used for fragment jobs. */
de85488138247d0 Boris Brezillon 2024-02-29 549 u64 fragment_core_mask;
de85488138247d0 Boris Brezillon 2024-02-29 550
de85488138247d0 Boris Brezillon 2024-02-29 551 /** @tiler_core_mask: Mask of tiler cores that can be used for tiler jobs. */
de85488138247d0 Boris Brezillon 2024-02-29 552 u64 tiler_core_mask;
de85488138247d0 Boris Brezillon 2024-02-29 553
de85488138247d0 Boris Brezillon 2024-02-29 554 /** @max_compute_cores: Maximum number of shader cores used for compute jobs. */
de85488138247d0 Boris Brezillon 2024-02-29 555 u8 max_compute_cores;
de85488138247d0 Boris Brezillon 2024-02-29 556
be7ffc821f5fc2e Liviu Dudau 2024-04-02 557 /** @max_fragment_cores: Maximum number of shader cores used for fragment jobs. */
de85488138247d0 Boris Brezillon 2024-02-29 558 u8 max_fragment_cores;
de85488138247d0 Boris Brezillon 2024-02-29 559
de85488138247d0 Boris Brezillon 2024-02-29 560 /** @max_tiler_cores: Maximum number of tiler cores used for tiler jobs. */
de85488138247d0 Boris Brezillon 2024-02-29 561 u8 max_tiler_cores;
de85488138247d0 Boris Brezillon 2024-02-29 562
de85488138247d0 Boris Brezillon 2024-02-29 563 /** @priority: Group priority (check panthor_csg_priority). */
de85488138247d0 Boris Brezillon 2024-02-29 564 u8 priority;
de85488138247d0 Boris Brezillon 2024-02-29 565
de85488138247d0 Boris Brezillon 2024-02-29 566 /** @blocked_queues: Bitmask reflecting the blocked queues. */
de85488138247d0 Boris Brezillon 2024-02-29 567 u32 blocked_queues;
de85488138247d0 Boris Brezillon 2024-02-29 568
de85488138247d0 Boris Brezillon 2024-02-29 569 /** @idle_queues: Bitmask reflecting the idle queues. */
de85488138247d0 Boris Brezillon 2024-02-29 570 u32 idle_queues;
de85488138247d0 Boris Brezillon 2024-02-29 571
de85488138247d0 Boris Brezillon 2024-02-29 572 /** @fatal_lock: Lock used to protect access to fatal fields. */
de85488138247d0 Boris Brezillon 2024-02-29 573 spinlock_t fatal_lock;
de85488138247d0 Boris Brezillon 2024-02-29 574
de85488138247d0 Boris Brezillon 2024-02-29 575 /** @fatal_queues: Bitmask reflecting the queues that hit a fatal exception. */
de85488138247d0 Boris Brezillon 2024-02-29 576 u32 fatal_queues;
de85488138247d0 Boris Brezillon 2024-02-29 577
de85488138247d0 Boris Brezillon 2024-02-29 578 /** @tiler_oom: Mask of queues that have a tiler OOM event to process. */
de85488138247d0 Boris Brezillon 2024-02-29 579 atomic_t tiler_oom;
de85488138247d0 Boris Brezillon 2024-02-29 580
de85488138247d0 Boris Brezillon 2024-02-29 581 /** @queue_count: Number of queues in this group. */
de85488138247d0 Boris Brezillon 2024-02-29 582 u32 queue_count;
de85488138247d0 Boris Brezillon 2024-02-29 583
de85488138247d0 Boris Brezillon 2024-02-29 584 /** @queues: Queues owned by this group. */
de85488138247d0 Boris Brezillon 2024-02-29 585 struct panthor_queue *queues[MAX_CS_PER_CSG];
de85488138247d0 Boris Brezillon 2024-02-29 586
de85488138247d0 Boris Brezillon 2024-02-29 587 /**
de85488138247d0 Boris Brezillon 2024-02-29 588 * @csg_id: ID of the FW group slot.
de85488138247d0 Boris Brezillon 2024-02-29 589 *
de85488138247d0 Boris Brezillon 2024-02-29 590 * -1 when the group is not scheduled/active.
de85488138247d0 Boris Brezillon 2024-02-29 591 */
de85488138247d0 Boris Brezillon 2024-02-29 592 int csg_id;
de85488138247d0 Boris Brezillon 2024-02-29 593
de85488138247d0 Boris Brezillon 2024-02-29 594 /**
de85488138247d0 Boris Brezillon 2024-02-29 595 * @destroyed: True when the group has been destroyed.
de85488138247d0 Boris Brezillon 2024-02-29 596 *
de85488138247d0 Boris Brezillon 2024-02-29 597 * If a group is destroyed it becomes useless: no further jobs can be submitted
de85488138247d0 Boris Brezillon 2024-02-29 598 * to its queues. We simply wait for all references to be dropped so we can
de85488138247d0 Boris Brezillon 2024-02-29 599 * release the group object.
de85488138247d0 Boris Brezillon 2024-02-29 600 */
de85488138247d0 Boris Brezillon 2024-02-29 601 bool destroyed;
de85488138247d0 Boris Brezillon 2024-02-29 602
de85488138247d0 Boris Brezillon 2024-02-29 603 /**
de85488138247d0 Boris Brezillon 2024-02-29 604 * @timedout: True when a timeout occurred on any of the queues owned by
de85488138247d0 Boris Brezillon 2024-02-29 605 * this group.
de85488138247d0 Boris Brezillon 2024-02-29 606 *
de85488138247d0 Boris Brezillon 2024-02-29 607 * Timeouts can be reported by drm_sched or by the FW. In any case, any
de85488138247d0 Boris Brezillon 2024-02-29 608 * timeout situation is unrecoverable, and the group becomes useless.
de85488138247d0 Boris Brezillon 2024-02-29 609 * We simply wait for all references to be dropped so we can release the
de85488138247d0 Boris Brezillon 2024-02-29 610 * group object.
de85488138247d0 Boris Brezillon 2024-02-29 611 */
de85488138247d0 Boris Brezillon 2024-02-29 612 bool timedout;
de85488138247d0 Boris Brezillon 2024-02-29 613
de85488138247d0 Boris Brezillon 2024-02-29 614 /**
de85488138247d0 Boris Brezillon 2024-02-29 615 * @syncobjs: Pool of per-queue synchronization objects.
de85488138247d0 Boris Brezillon 2024-02-29 616 *
de85488138247d0 Boris Brezillon 2024-02-29 617 * One sync object per queue. The position of the sync object is
de85488138247d0 Boris Brezillon 2024-02-29 618 * determined by the queue index.
de85488138247d0 Boris Brezillon 2024-02-29 619 */
de85488138247d0 Boris Brezillon 2024-02-29 620 struct panthor_kernel_bo *syncobjs;
de85488138247d0 Boris Brezillon 2024-02-29 621
d7baaf2591f58fc Adrián Larumbe 2024-09-03 622 /** @fdinfo: Per-file total cycle and timestamp values reference. */
d7baaf2591f58fc Adrián Larumbe 2024-09-03 623 struct {
d7baaf2591f58fc Adrián Larumbe 2024-09-03 624 /** @data: Pointer to actual per-file sample data. */
d7baaf2591f58fc Adrián Larumbe 2024-09-03 625 struct panthor_gpu_usage *data;
d7baaf2591f58fc Adrián Larumbe 2024-09-03 626
d7baaf2591f58fc Adrián Larumbe 2024-09-03 627 /**
d7baaf2591f58fc Adrián Larumbe 2024-09-03 628 * @lock: Mutex to govern concurrent access from drm file's fdinfo callback
d7baaf2591f58fc Adrián Larumbe 2024-09-03 629 * and job post-completion processing function
d7baaf2591f58fc Adrián Larumbe 2024-09-03 630 */
d7baaf2591f58fc Adrián Larumbe 2024-09-03 631 struct mutex lock;
d7baaf2591f58fc Adrián Larumbe 2024-09-03 632 } fdinfo;
d7baaf2591f58fc Adrián Larumbe 2024-09-03 633
de85488138247d0 Boris Brezillon 2024-02-29 634 /** @state: Group state. */
de85488138247d0 Boris Brezillon 2024-02-29 635 enum panthor_group_state state;
de85488138247d0 Boris Brezillon 2024-02-29 636
de85488138247d0 Boris Brezillon 2024-02-29 637 /**
de85488138247d0 Boris Brezillon 2024-02-29 638 * @suspend_buf: Suspend buffer.
de85488138247d0 Boris Brezillon 2024-02-29 639 *
de85488138247d0 Boris Brezillon 2024-02-29 640 * Stores the state of the group and its queues when a group is suspended.
de85488138247d0 Boris Brezillon 2024-02-29 641 * Used at resume time to restore the group in its previous state.
de85488138247d0 Boris Brezillon 2024-02-29 642 *
de85488138247d0 Boris Brezillon 2024-02-29 643 * The size of the suspend buffer is exposed through the FW interface.
de85488138247d0 Boris Brezillon 2024-02-29 644 */
de85488138247d0 Boris Brezillon 2024-02-29 645 struct panthor_kernel_bo *suspend_buf;
de85488138247d0 Boris Brezillon 2024-02-29 646
de85488138247d0 Boris Brezillon 2024-02-29 647 /**
de85488138247d0 Boris Brezillon 2024-02-29 648 * @protm_suspend_buf: Protection mode suspend buffer.
de85488138247d0 Boris Brezillon 2024-02-29 649 *
de85488138247d0 Boris Brezillon 2024-02-29 650 * Stores the state of the group and its queues when a group that's in
de85488138247d0 Boris Brezillon 2024-02-29 651 * protection mode is suspended.
de85488138247d0 Boris Brezillon 2024-02-29 652 *
de85488138247d0 Boris Brezillon 2024-02-29 653 * Used at resume time to restore the group in its previous state.
de85488138247d0 Boris Brezillon 2024-02-29 654 *
de85488138247d0 Boris Brezillon 2024-02-29 655 * The size of the protection mode suspend buffer is exposed through the
de85488138247d0 Boris Brezillon 2024-02-29 656 * FW interface.
de85488138247d0 Boris Brezillon 2024-02-29 657 */
de85488138247d0 Boris Brezillon 2024-02-29 658 struct panthor_kernel_bo *protm_suspend_buf;
de85488138247d0 Boris Brezillon 2024-02-29 659
de85488138247d0 Boris Brezillon 2024-02-29 660 /** @sync_upd_work: Work used to check/signal job fences. */
de85488138247d0 Boris Brezillon 2024-02-29 661 struct work_struct sync_upd_work;
de85488138247d0 Boris Brezillon 2024-02-29 662
de85488138247d0 Boris Brezillon 2024-02-29 663 /** @tiler_oom_work: Work used to process tiler OOM events happening on this group. */
de85488138247d0 Boris Brezillon 2024-02-29 664 struct work_struct tiler_oom_work;
de85488138247d0 Boris Brezillon 2024-02-29 665
de85488138247d0 Boris Brezillon 2024-02-29 666 /** @term_work: Work used to finish the group termination procedure. */
de85488138247d0 Boris Brezillon 2024-02-29 667 struct work_struct term_work;
de85488138247d0 Boris Brezillon 2024-02-29 668
de85488138247d0 Boris Brezillon 2024-02-29 669 /**
de85488138247d0 Boris Brezillon 2024-02-29 670 * @release_work: Work used to release group resources.
de85488138247d0 Boris Brezillon 2024-02-29 671 *
de85488138247d0 Boris Brezillon 2024-02-29 672 * We need to postpone the group release to avoid a deadlock when
de85488138247d0 Boris Brezillon 2024-02-29 673 * the last ref is released in the tick work.
de85488138247d0 Boris Brezillon 2024-02-29 674 */
de85488138247d0 Boris Brezillon 2024-02-29 675 struct work_struct release_work;
de85488138247d0 Boris Brezillon 2024-02-29 676
de85488138247d0 Boris Brezillon 2024-02-29 677 /**
de85488138247d0 Boris Brezillon 2024-02-29 678 * @run_node: Node used to insert the group in the
de85488138247d0 Boris Brezillon 2024-02-29 679 * panthor_group::groups::{runnable,idle} and
de85488138247d0 Boris Brezillon 2024-02-29 680 * panthor_group::reset.stopped_groups lists.
de85488138247d0 Boris Brezillon 2024-02-29 681 */
de85488138247d0 Boris Brezillon 2024-02-29 682 struct list_head run_node;
de85488138247d0 Boris Brezillon 2024-02-29 683
de85488138247d0 Boris Brezillon 2024-02-29 684 /**
de85488138247d0 Boris Brezillon 2024-02-29 685 * @wait_node: Node used to insert the group in the
de85488138247d0 Boris Brezillon 2024-02-29 686 * panthor_group::groups::waiting list.
de85488138247d0 Boris Brezillon 2024-02-29 687 */
de85488138247d0 Boris Brezillon 2024-02-29 688 struct list_head wait_node;
de85488138247d0 Boris Brezillon 2024-02-29 @689 };
de85488138247d0 Boris Brezillon 2024-02-29 690
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Adrián,
kernel test robot noticed the following build errors:
[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on linus/master v6.11-rc6 next-20240904]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-i…
base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link: https://lore.kernel.org/r/20240903202541.430225-2-adrian.larumbe%40collabor…
patch subject: [PATCH v5 1/4] drm/panthor: introduce job cycle and timestamp accounting
config: arc-allmodconfig (https://download.01.org/0day-ci/archive/20240905/202409050054.oRwtzLQ4-lkp@…)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240905/202409050054.oRwtzLQ4-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409050054.oRwtzLQ4-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from <command-line>:
In function 'copy_instrs_to_ringbuf',
inlined from 'queue_run_job' at drivers/gpu/drm/panthor/panthor_sched.c:3089:2:
>> include/linux/compiler_types.h:510:45: error: call to '__compiletime_assert_435' declared with attribute error: min(ringbuf_size - start, size) signedness error
510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:491:25: note: in definition of macro '__compiletime_assert'
491 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:510:9: note: in expansion of macro '_compiletime_assert'
510 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/minmax.h:100:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
100 | BUILD_BUG_ON_MSG(!__types_ok(x,y,ux,uy), \
| ^~~~~~~~~~~~~~~~
include/linux/minmax.h:105:9: note: in expansion of macro '__careful_cmp_once'
105 | __careful_cmp_once(op, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_))
| ^~~~~~~~~~~~~~~~~~
include/linux/minmax.h:129:25: note: in expansion of macro '__careful_cmp'
129 | #define min(x, y) __careful_cmp(min, x, y)
| ^~~~~~~~~~~~~
drivers/gpu/drm/panthor/panthor_sched.c:2882:19: note: in expansion of macro 'min'
2882 | written = min(ringbuf_size - start, size);
| ^~~
vim +/__compiletime_assert_435 +510 include/linux/compiler_types.h
eb5c2d4b45e3d2 Will Deacon 2020-07-21 496
eb5c2d4b45e3d2 Will Deacon 2020-07-21 497 #define _compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 498 __compiletime_assert(condition, msg, prefix, suffix)
eb5c2d4b45e3d2 Will Deacon 2020-07-21 499
eb5c2d4b45e3d2 Will Deacon 2020-07-21 500 /**
eb5c2d4b45e3d2 Will Deacon 2020-07-21 501 * compiletime_assert - break build and emit msg if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21 502 * @condition: a compile-time constant condition to check
eb5c2d4b45e3d2 Will Deacon 2020-07-21 503 * @msg: a message to emit if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21 504 *
eb5c2d4b45e3d2 Will Deacon 2020-07-21 505 * In tradition of POSIX assert, this macro will break the build if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21 506 * supplied condition is *false*, emitting the supplied error message if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21 507 * compiler has support to do so.
eb5c2d4b45e3d2 Will Deacon 2020-07-21 508 */
eb5c2d4b45e3d2 Will Deacon 2020-07-21 509 #define compiletime_assert(condition, msg) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 @510 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2 Will Deacon 2020-07-21 511
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki