Re: [PATCH net 1/2] net/devmem: Reject insufficiently large dmabuf pools

24 Apr 2025


      On Thu, Apr 24, 2025 at 3:40 PM Stanislav Fomichev stfomichev@gmail.com wrote:
...
On 04/24, Mina Almasry wrote:
...
On Thu, Apr 24, 2025 at 3:10 PM Stanislav Fomichev stfomichev@gmail.com wrote:
...
On 04/24, Mina Almasry wrote:
...
On Wed, Apr 23, 2025 at 1:15 PM Stanislav Fomichev stfomichev@gmail.com wrote:
...
On 04/23, Mina Almasry wrote:
...
On Wed, Apr 23, 2025 at 9:03 AM Cosmin Ratiu cratiu@nvidia.com wrote:
>
> Drivers that are told to allocate RX buffers from pools of DMA memory
> should have enough memory in the pool to satisfy projected allocation
> requests (a function of ring size, MTU & other parameters). If there's
> not enough memory, RX ring refill might fail later at inconvenient times
> (e.g. during NAPI poll).
>
My understanding is that if the RX ring refill fails, the driver will
post the buffers it was able to allocate data for, and will not post
other buffers. So it will run with a degraded performance but nothing
overly bad should happen. This should be the same behavior if the
machine is under memory pressure.
In general I don't know about this change. If the user wants to use
very small dmabufs, they should be able to, without going through
hoops reducing the number of rx ring slots the driver has (if it
supports configuring that).
I think maybe printing an error or warning that the dmabuf is too
small for the pool_size may be fine. But outright failing this
configuration? I don't think so.
> This commit adds a check at dmabuf pool init time that compares the
> amount of memory in the underlying chunk pool (configured by the user
> space application providing dmabuf memory) with the desired pool size
> (previously set by the driver) and fails with an error message if chunk
> memory isn't enough.
>
> Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
> Signed-off-by: Cosmin Ratiu cratiu@nvidia.com
> ---
>  net/core/devmem.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 6e27a47d0493..651cd55ebb28 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -299,6 +299,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
>  int mp_dmabuf_devmem_init(struct page_pool *pool)
>  {
>         struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
> +       size_t size;
>
>         if (!binding)
>                 return -EINVAL;
> @@ -312,6 +313,16 @@ int mp_dmabuf_devmem_init(struct page_pool *pool)
>         if (pool->p.order != 0)
>                 return -E2BIG;
>
> +       /* Validate that the underlying dmabuf has enough memory to satisfy
> +        * requested pool size.
> +        */
> +       size = gen_pool_size(binding->chunk_pool) >> PAGE_SHIFT;
> +       if (size < pool->p.pool_size) {
pool_size seems to be the number of ptr_ring slots in the page_pool,
not some upper or lower bound on the amount of memory the page_pool
can provide. So this check seems useless? The page_pool can still not
provide this amount of memory with dmabuf (if the netmems aren't being
recycled fast enough) or with normal memory (under memory pressure).
I read this check more as "is there enough chunks in the binding to
fully fill in the page pool". User controls the size of rx ring
Only on drivers that support ethtool -G, and where it will let you
configure -G to what you want.
gve is the minority here, any major nic (brcm/mlx/intel) supports resizing
the rings.
GVE supports resizing rings. Other drivers may not. Even on drivers
that support resizing rings. Some users may have a use case for a
dmabuf smaller than the minimum ring size their driver accepts.
...
...
...
which
controls the size of the page pool which somewhat dictates the minimal
size of the binding (maybe).
See the test I ran in the other thread. Seems at least GVE is fine
with dmabuf size < ring size. I don't know what other drivers do, but
generally speaking I think specific driver limitations should not
limit what others can do with their drivers. Sure for the GPU mem
applications you're probably looking at the dmabufs are huge and
supporting small dmabufs is not a concern, but someone somewhere may
want to run with 1 MB dmabuf for some use case and if their driver is
fine with it, core should not prevent it, I think.
...
So it's more of a sanity check.
Maybe having better defaults in ncdevmem would've been a better option? It
allocates (16000*4096) bytes (slightly less than 64MB, why? to fit into
default /sys/module/udmabuf/parameters/size_limit_mb?) and on my setup
PP wants to get 64MB at least..
Yeah, udmabuf has a limitation that it only supports 64MB max size
last I looked.
We can use /sys/module/udmabuf/parameters/size_limit_mb to allocate
more than 64MB, ncdevmem can change it.
The udmabuf limit is hardcoded, in udmabuf.c or configured on module
load, and ncdevmem doesn't load udmabuf. I guess it could be changed
to that, but currently ncdevmem works with CONFIG_UDMABUF=y
You don't need to load/reload the module to change module params:
# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys)
# cat /sys/module/udmabuf/parameters/size_limit_mb
64
# echo 128 > /sys/module/udmabuf/parameters/size_limit_mb
# cat /sys/module/udmabuf/parameters/size_limit_mb
128
Today I learned! Thanks!
I will put it on my todo list to make ncdevmem force a larger limit to
udmabuf and allocate a larger dmabuf by default. Technically any
dmabuf should be supported, but by default I think probably ncdevmem
should use a .5 GB -> 1GB dmabuf that is more common for these GPU
applications or something. There could be an option as well for folks
to test their driver against smaller dmabufs.
-- 
Thanks,
Mina

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH net 1/2] net/devmem: Reject insufficiently large dmabuf pools