On Thu, Apr 11, 2024 at 1:21 AM Rong Qianfeng <11065417(a)vivo.com> wrote:
>
>
> 在 2024/4/10 0:37, T.J. Mercier 写道:
> > [You don't often get email from tjmercier(a)google.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065417(a)vivo.com> wrote:
> >>
> >> 在 2024/4/8 15:58, Christian König 写道:
> >>> Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> >>>> [SNIP]
> >>>>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
> >>>>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
> >>>>>> offset and len.
> >>>>> You have not given an use case for this so it is a bit hard to
> >>>>> review. And from the existing use cases I don't see why this should
> >>>>> be necessary.
> >>>>>
> >>>>> Even worse from the existing backend implementation I don't even see
> >>>>> how drivers should be able to fulfill this semantics.
> >>>>>
> >>>>> Please explain further,
> >>>>> Christian.
> >>>> Here is a practical case:
> >>>> The user space can allocate a large chunk of dma-buf for
> >>>> self-management, used as a shared memory pool.
> >>>> Small dma-buf can be allocated from this shared memory pool and
> >>>> released back to it after use, thus improving the speed of dma-buf
> >>>> allocation and release.
> >>>> Additionally, custom functionalities such as memory statistics and
> >>>> boundary checking can be implemented in the user space.
> >>>> Of course, the above-mentioned functionalities require the
> >>>> implementation of a partial cache sync interface.
> >>> Well that is obvious, but where is the code doing that?
> >>>
> >>> You can't send out code without an actual user of it. That will
> >>> obviously be rejected.
> >>>
> >>> Regards,
> >>> Christian.
> >> In fact, we have already used the user-level dma-buf memory pool in the
> >> camera shooting scenario on the phone.
> >>
> >> From the test results, The execution time of the photo shooting
> >> algorithm has been reduced from 3.8s to 3s.
> >>
> > For phones, the (out of tree) Android version of the system heap has a
> > page pool connected to a shrinker. That allows you to skip page
> > allocation without fully pinning the memory like you get when
> > allocating a dma-buf that's way larger than necessary. If it's for a
> > camera maybe you need physically contiguous memory, but it's also
> > possible to set that up.
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/d…
> >
> Thank you for the reminder.
>
> The page pool of the system heap can meet the needs of most scenarios,
> but the camera shooting scenario is quite special.
>
> Why the camera shooting scenario needs to have a dma-buf memory pool in
> the user-level?
>
> (1) The memory demand is extremely high and time requirements are
> stringent.
Preallocating for this makes sense to me, whether it is individual
buffers or one large one.
> (2) The memory in the page pool(system heap) is easily reclaimed or used
> by other apps.
Yeah, by design I guess. I have seen an implementation where the page
pool is proactively refilled after it has been empty for some time
which would help in scenarios where the pool is often reclaimed and
low/empty. You could add that in a vendor heap.
> (3) High concurrent allocation and release (with deferred-free) lead to
> high memory usage peaks.
Hopefully this is not every frame? I saw enough complaints about the
deferred free of pool pages that it hasn't been carried forward since
Android 13, so this should be less of a problem on recent kernels.
> Additionally, after the camera exits, the shared memory pool can be
> released, with minimal impact.
Why do you care about the difference here? In one case it's when the
buffer refcount goes to 0 and memory is freed immediately, and in the
other it's the next time reclaim runs the shrinker.
I don't want to add UAPI for DMA_BUF_IOCTL_SYNC_PARTIAL to Android
without it being in the upstream kernel. I don't think we can get that
without an in-kernel user of dma_buf_begin_cpu_access_partial first,
even though your use case wouldn't rely on that in-kernel usage. :\ So
if you want to add this to phones for your camera app, then I think
your best option is to add a vendor driver which implements this IOCTL
and calls the dma_buf_begin_cpu_access_partial functions which are
already exported.
Best,
T.J.
On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065417(a)vivo.com> wrote:
>
>
> 在 2024/4/8 15:58, Christian König 写道:
> > Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> >> [SNIP]
> >>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
> >>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
> >>>> offset and len.
> >>>
> >>> You have not given an use case for this so it is a bit hard to
> >>> review. And from the existing use cases I don't see why this should
> >>> be necessary.
> >>>
> >>> Even worse from the existing backend implementation I don't even see
> >>> how drivers should be able to fulfill this semantics.
> >>>
> >>> Please explain further,
> >>> Christian.
> >> Here is a practical case:
> >> The user space can allocate a large chunk of dma-buf for
> >> self-management, used as a shared memory pool.
> >> Small dma-buf can be allocated from this shared memory pool and
> >> released back to it after use, thus improving the speed of dma-buf
> >> allocation and release.
> >> Additionally, custom functionalities such as memory statistics and
> >> boundary checking can be implemented in the user space.
> >> Of course, the above-mentioned functionalities require the
> >> implementation of a partial cache sync interface.
> >
> > Well that is obvious, but where is the code doing that?
> >
> > You can't send out code without an actual user of it. That will
> > obviously be rejected.
> >
> > Regards,
> > Christian.
>
> In fact, we have already used the user-level dma-buf memory pool in the
> camera shooting scenario on the phone.
>
> From the test results, The execution time of the photo shooting
> algorithm has been reduced from 3.8s to 3s.
>
For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker. That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.
https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/d…
> To be honest, I didn't understand your concern "...how drivers should be
> able to fulfill this semantics." Can you please help explain it in more
> detail?
>
> Thanks,
>
> Rong Qianfeng.
>
> >
> >>
> >> Thanks
> >> Rong Qianfeng.
> >
>
On Wed, 10 Apr 2024 16:02:00 +0200 Julien Panis wrote:
> > You shouldn't build the skb upfront any more. Give the page to the HW,
> > once HW sends you a completion - build the skbs. If build fails
> > (allocation failure) just give the page back to HW. If it succeeds,
> > however, you'll get a skb which is far more likely to be cache hot.
>
> Not sure I get this point.
>
> "Give the page to the HW" = Do you mean that I should dma_map_single()
> a full page (|PAGE_SIZE|) in am65_cpsw_nuss_rx_push() ?
Yes, I think so. I think that's what you effectively do now anyway,
you just limit the len and wrap it in an skb. But
am65_cpsw_nuss_rx_push() will effectively get that page back from
skb->data and map it.
On Wed, 10 Apr 2024 10:36:16 +0200 Julien Panis wrote:
> Also, about mem alloc failures, shouldn't we free 'pool' on kstrdup_const()
> error at the beginning of k3_cppi_desc_pool_create_name() ?
> I mean, it's not visible in my patch but I now wonder if this was done
> properly even before I modify the file.
Yes, it uses managed memory (devm_*) but prueth (I didn't check other
callers) calls it from .ndo_open. So while not technically a full leak
we can accumulate infinite memory by repeatedly failing here :S
On Mon, 08 Apr 2024 11:38:04 +0200 Julien Panis wrote:
> +static struct sk_buff *am65_cpsw_alloc_skb(struct am65_cpsw_rx_chn *rx_chn,
> + struct net_device *ndev,
> + unsigned int len,
> + int desc_idx,
> + bool allow_direct)
> +{
> + struct sk_buff *skb;
> + struct page *page;
> +
> + page = page_pool_dev_alloc_pages(rx_chn->page_pool);
> + if (unlikely(!page))
> + return NULL;
> +
> + len += AM65_CPSW_HEADROOM;
> +
> + skb = build_skb(page_address(page), len);
You shouldn't build the skb upfront any more. Give the page to the HW,
once HW sends you a completion - build the skbs. If build fails
(allocation failure) just give the page back to HW. If it succeeds,
however, you'll get a skb which is far more likely to be cache hot.
> + if (unlikely(!skb)) {
> + page_pool_put_full_page(rx_chn->page_pool, page, allow_direct);
> + rx_chn->pages[desc_idx] = NULL;
> + return NULL;
> + }
On Mon, 08 Apr 2024 11:38:03 +0200 Julien Panis wrote:
> goto gen_pool_create_fail;
> }
>
> + pool->desc_infos = kcalloc(pool->num_desc,
> + sizeof(*pool->desc_infos), GFP_KERNEL);
> + if (!pool->desc_infos) {
> + ret = -ENOMEM;
> + dev_err(pool->dev,
> + "pool descriptor infos alloc failed %d\n", ret);
Please don't add errors on mem alloc failures. They just bloat the
kernel, there will be a rather large OOM splat in the logs if GFP_KERNEL
allocation fails.
> + kfree_const(pool_name);
> + goto gen_pool_desc_infos_alloc_fail;
> + }
> +
> pool->gen_pool->name = pool_name;
If you add the new allocation after this line, I think you wouldn't
have to free pool_name under the if () explicitly.
--
pw-bot: cr
On 09/04/2024 15:42, Alexandre Mergnat wrote:
> Add the audio codec sub-device. This sub-device is used to set required
> and optional voltage properties between the codec and the board.
>
> Signed-off-by: Alexandre Mergnat <amergnat(a)baylibre.com>
> ---
> Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml b/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml
> index 37423c2e0fdf..7c6a4a587b5f 100644
> --- a/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml
> +++ b/Documentation/devicetree/bindings/mfd/mediatek,mt6357.yaml
> @@ -37,6 +37,11 @@ properties:
> "#interrupt-cells":
> const: 2
>
> + codec:
> + type: object
> + $ref: /schemas/sound/mt6357.yaml
> + unevaluatedProperties: false
Just put the properties here, without the codec node.
Also, your example is now incomplete.
Best regards,
Krzysztof