On Sun, 5 Nov 2023 18:44:09 -0800 Mina Almasry wrote:
> + if (!skb_frags_not_readable(skb)) {
nit: maybe call the helper skb_frags_readable() and flip
the polarity, avoid double negatives.
On Sun, 5 Nov 2023 18:44:01 -0800 Mina Almasry wrote:
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 6fc5134095ed..d4bea053bb7e 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -60,6 +60,8 @@ struct page_pool_params {
> int nid;
> struct device *dev;
> struct napi_struct *napi;
> + u8 memory_provider;
> + void *mp_priv;
> enum dma_data_direction dma_dir;
> unsigned int max_len;
> unsigned int offset;
you should rebase on top of net-next
More importantly I was expecting those fields to be gone from params.
The fact that the page pool is configured to a specific provider
should be fully transparent to the driver, driver should just tell
the core what queue its creating the pool from and if there's a dmabuf
bound for that queue - out pops a pp backed by the dmabuf.
My RFC had the page pool params fields here as a hack.
On Tue, 7 Nov 2023 14:23:20 -0800 Stanislav Fomichev wrote:
> Can we mark a socket as devmem-only? Do we have any use-case for those
> hybrid setups? Or, let me put it that way: do we expect API callers
> to handle both linear and non-linear cases correctly?
> As a consumer of the previous versions of these apis internally,
> I find all those corner cases confusing :-( Hence trying to understand
> whether we can make it a bit more rigid and properly defined upstream.
FWIW I'd also prefer to allow mixing. "Some NICs" can decide HDS
very flexibly, incl. landing full jumbo frames into the "headers".
There's no sender API today to signal how to mark the data for
selective landing, but if Mina already has the rx side written
to allow that...
On Sun, 5 Nov 2023 18:44:02 -0800 Mina Almasry wrote:
> + -
> + name: queues
> + doc: receive queues to bind the dma-buf to.
> + type: u32
> + multi-attr: true
I think that you should throw in the queue type.
I know you made the op imply RX:
> + -
> + name: bind-rx
but if we decide to create a separate "type" for some magic
queue type one day we'll have to ponder how to extend this API
IMHO queue should be identified by a <type, id> tuple, always.
On Sun, 5 Nov 2023 18:44:05 -0800 Mina Almasry wrote:
> +static int mp_dmabuf_devmem_init(struct page_pool *pool)
> +{
> + struct netdev_dmabuf_binding *binding = pool->mp_priv;
> +
> + if (!binding)
> + return -EINVAL;
> +
> + if (pool->p.flags & PP_FLAG_DMA_MAP ||
> + pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> + return -EOPNOTSUPP;
This looks backwards, we should _force_ the driver to use the dma
mapping built into the page pool APIs, to isolate the driver from
how the DMA addr actually gets obtained. Right?
Maybe seeing driver patches would illuminate.
On Thu, 09 Nov 2023 12:05:37 +0100 Paolo Abeni wrote:
> > I suppose we just disagree on the elegance of the API.
>
> FWIW, I think sockopt +cmsg is the right API.
FWIW it's fine by me as well.
My brain is slightly fried after trying to catch up on the thread
for close to 2h. So forgive me if I'm missing something.
This applies to all emails I'm about to send :)
On Sun, 5 Nov 2023 18:44:11 -0800 Mina Almasry wrote:
> + trigger_device_reset();
The user space must not be responsible for the reset.
We can add some temporary "recreate page pools" ndo
until the queue API is ready.
But it should not be visible to the user in any way.
And then the kernel can issue the same reset when the netlink
socket dies to flush device free lists.
Maybe we should also add a "allow device/all-queues reload" flag
to the netlink API to differentiate drivers which can't implement
full queue API later on. We want to make sure the defaults work well
in our "target design", rather than at the first stage. And target
design will reload queues one by one.
Am 09.11.23 um 04:20 schrieb Mina Almasry:
> [SNIP]
>> But we might be able to do something as folio is doing now, mm subsystem
>> is still seeing 'struct folio/page', but other subsystem like slab is using
>> 'struct slab', and there is still some common fields shared between
>> 'struct folio' and 'struct slab'.
>>
> In my eyes this is almost exactly what I suggested in RFC v1 and got
> immediately nacked with no room to negotiate. What we did for v1 is to
> allocate struct pages for dma-buf to make dma-bufs look like struct
> page to mm subsystem. Almost exactly what you're describing above.
> It's a no-go. I don't think renaming struct page to netmem is going to
> move the needle (it also re-introduces code-churn). What I feel like I
> learnt is that dma-bufs are not struct pages and can't be made to look
> like one, I think.
Yeah, that pretty much hits the nail on the head. What was not mentioned
yet and you could potentially try to do is to go the other way around.
In other words instead of importing a DMA-buf file descriptor into the
page-pool, you take some netdev page-pool pages and export them as
DMA-buf handle.
All you then need to do is to implement the necessary DMA-buf interface,
e.g. wait for DMA-fences before accessing stuff, when you have an async
operation install a DMA-fence so that other can wait for your operation
to finish etc.. etc..
This still doesn't gives you peer 2 peer over for example the PCIe bus,
but it would give you zero copy in the sense that a GPU or other
acceleration device directly writes stuff into memory a network device
can work with.
We already have some similar functionality for at least Intel and AMD hw
where an userptr mechanism is used to make malloced memory (backed by
normal struct pages) available to the GPU. The problem with this
approach is that most GPUs currently can't do good recoverable page
faults which in turn makes the whole MMU notifier based approach just
horrible inefficient.
Just giving a few more pointers you might want to look into...
Cheers,
Christian.