Re: [PATCH net 1/2] net/devmem: Reject insufficiently large dmabuf pools

24 Apr 2025


      On Wed, 2025-04-23 at 10:30 -0700, Mina Almasry wrote:
...
On Wed, Apr 23, 2025 at 9:03 AM Cosmin Ratiu cratiu@nvidia.com
wrote:
...
Drivers that are told to allocate RX buffers from pools of DMA
memory
should have enough memory in the pool to satisfy projected
allocation
requests (a function of ring size, MTU & other parameters). If
there's
not enough memory, RX ring refill might fail later at inconvenient
times
(e.g. during NAPI poll).
My understanding is that if the RX ring refill fails, the driver will
post the buffers it was able to allocate data for, and will not post
other buffers. So it will run with a degraded performance but nothing
overly bad should happen. This should be the same behavior if the
machine is under memory pressure.
What motivated this change was a failure in how mlx5 refills rings
today. For efficiency, ring refill triggered by NAPI polling only
releases old buffers just before allocating new ones so effectively has
a built-in assumption that the ring can be filled. Commit 4c2a13236807
("net/mlx5e: RX, Defer page release in striding rq for better
recycling") is an interesting read here.
For more details, see the do{ }while loop in mlx5e_post_rx_mpwqes,
where mlx5e_free_rx_mpwqe then mlx5e_alloc_rx_mpwqe are called to free
the old buffer and reallocate a new one at the same position. This has
excellent cache-locality and the pages returned to the pool will be
reused by the new descriptor.
The bug in mlx5 is that with a large MTU & ring size, the ring cannot
be fully populated with rx descriptors because the pool doesn't have
enough memory, but there's no memory released back to the pool for new
ones. Eventually, rx descriptors are exhausted and traffic stops.
...
In general I don't know about this change. If the user wants to use
very small dmabufs, they should be able to, without going through
hoops reducing the number of rx ring slots the driver has (if it
supports configuring that).
I think maybe printing an error or warning that the dmabuf is too
small for the pool_size may be fine. But outright failing this
configuration? I don't think so.
For regular memory-backed page pools, there's no hard limit of how big
they can become (except available physical memory), so this issue was
not seen before.
I didn't look at other drivers yet, but is it expected that drivers
operate with incompletely filled rings? I assumed that since the user
configured a specified ring size, it expects drivers to be able to use
that size and not silently operate in degraded mode, with a smaller
ring size.
If you think drivers should work in degraded mode, we can look at
improving the ring population code to work better in this scenario.
...
pool_size seems to be the number of ptr_ring slots in the page_pool,
not some upper or lower bound on the amount of memory the page_pool
can provide. So this check seems useless? The page_pool can still not
provide this amount of memory with dmabuf (if the netmems aren't
being
recycled fast enough) or with normal memory (under memory pressure).
I think pool_size is usually set by drivers based on their params, and
it's the max size of pool->ring. The opportunistic check I added
compares this demand with the supply (available chunk memory) and fails
this config based on the assumption that there should be enough memory
in the pool to satisfy driver needs.
Please let me know your thoughts and how to proceed.
Cosmin.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH net 1/2] net/devmem: Reject insufficiently large dmabuf pools