Re: [RFC PATCH v3 04/12] netdev: support binding dma-buf to netdevice

10 Nov 2023


      On Thu, Nov 9, 2023 at 11:38 PM Yunsheng Lin linyunsheng@huawei.com wrote:
...
On 2023/11/10 10:59, Mina Almasry wrote:
...
On Thu, Nov 9, 2023 at 12:30 AM Paolo Abeni pabeni@redhat.com wrote:
...
I'm trying to wrap my head around the whole infra... the above line is
confusing. Why do you increment dma_addr? it will be re-initialized in
the next iteration.
That is just a mistake, sorry. Will remove this increment.
You seems to be combining comments in different thread and replying in
one thread, I am not sure that is a good practice and I almost missed the
reply below as I don't seem to be cc'ed.
Sorry about that.
...
...
On Thu, Nov 9, 2023 at 1:29 AM Yunsheng Lin linyunsheng@huawei.com wrote:> >>>
...
...
...
...
gen_pool_destroy BUG_ON() if it's not empty at the time of destroying.
Technically that should never happen, because
__netdev_devmem_binding_free() should only be called when the refcount
hits 0, so all the chunks have been freed back to the gen_pool. But,
just in case, I don't want to crash the server just because I'm
leaking a chunk... this is a bit of defensive programming that is
typically frowned upon, but the behavior of gen_pool is so severe I
think the WARN() + check is warranted here.
It seems it is pretty normal for the above to happen nowadays because of
retransmits timeouts, NAPI defer schemes mentioned below:
https://lkml.kernel.org/netdev/168269854650.2191653.8465259808498269815.stgi...
And currently page pool core handles that by using a workqueue.
Forgive me but I'm not understanding the concern here.
__netdev_devmem_binding_free() is called when binding->ref hits 0.
binding->ref is incremented when an iov slice of the dma-buf is
allocated, and decremented when an iov is freed. So,
__netdev_devmem_binding_free() can't really be called unless all the
iovs have been freed, and gen_pool_size() == gen_pool_avail(),
regardless of what's happening on the page_pool side of things, right?
I seems to misunderstand it. In that case, it seems to be about
defensive programming like other checking.
By looking at it more closely, it seems napi_frag_unref() call
page_pool_page_put_many() directly， which means devmem seems to
be bypassing the napi_safe optimization.
Can napi_frag_unref() reuse napi_pp_put_page() in order to reuse
the napi_safe optimization?
I think it already does. page_pool_page_put_many() is only called if
!recycle or !napi_pp_put_page(). In that case
page_pool_page_put_many() is just a replacement for put_page(),
because this 'page' may be an iov.
Is there a reason why not calling napi_pp_put_page() for devmem too
instead of calling page_pool_page_put_many()? mem provider has a
'release_page' ops, calling page_pool_page_put_many() directly here
seems to be bypassing the 'release_page' ops, which means devmem is
bypassing most of the main features of page pool.
I think we're still calling napi_pp_put_page() as normal:
/**
@@ -3441,13 +3466,13 @@ bool napi_pp_put_page(struct page *page, bool
napi_safe);
 static inline void
 napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
 {
-       struct page *page = skb_frag_page(frag);
-
 #ifdef CONFIG_PAGE_POOL
-       if (recycle && napi_pp_put_page(page, napi_safe))
+       if (recycle && napi_pp_put_page(frag->bv_page, napi_safe))
                return;
+       page_pool_page_put_many(frag->bv_page, 1);
+#else
+       put_page(skb_frag_page(frag));
 #endif
-       put_page(page);
 }
The code change here is to replace put_page() with
page_pool_page_put_many(), only, because bv_page may be a
page_pool_iov, so we need to use page_pool_page_put_many() which
handles page_pool_iov correcly. I did not change whether or not
napi_pp_put_page() is called. It's still called if recycle==true.
...
As far as I can tell, the main features of page pool:

Allow lockless allocation and freeing in pool->alloc cache by
utilizing NAPI non-concurrent context.
Allow concurrent allocation and freeing in pool->ring cache by
utilizing ptr_ring.
Allow dma map/unmap and cache sync optimization.
Allow detailed stats logging and tracing.
Allow some bulk allocation and freeing.
support both skb packet and xdp frame.

I am wondering what is the main features that devmem is utilizing
by intergrating into page pool?
It seems the driver can just call netdev_alloc_devmem() and
napi_frag_unref() can call netdev_free_devmem() directly without
intergrating into page pool and it should just works too?
Maybe we should consider creating a new thin layer, in order to
demux to page pool, devmem or other mem type if my suggestion does
not work out too?
I went through this discussion with Jesper on RFC v2 in this thread:
https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4p...
which culminates with that email where he seems on board with the
change from a performance POV, and seems on board with hiding the
memory type implementation from the drivers. That thread fully goes
over the tradeoffs of integrating over the page pool or creating new
ones. Integrating with the page pool abstracts most of the devmem
implementation (and other memory types) from the driver. It reuses
page pool features like page recycling for example.
-- 
Thanks,
Mina

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [RFC PATCH v3 04/12] netdev: support binding dma-buf to netdevice