On Tue, Nov 7, 2023 at 4:01 PM David Ahern dsahern@kernel.org wrote:
On 11/7/23 4:55 PM, Mina Almasry wrote:
On Mon, Nov 6, 2023 at 4:03 PM Willem de Bruijn willemdebruijn.kernel@gmail.com wrote:
On Mon, Nov 6, 2023 at 3:55 PM David Ahern dsahern@kernel.org wrote:
On 11/6/23 4:32 PM, Stanislav Fomichev wrote:
The concise notification API returns tokens as a range for compression, encoding as two 32-bit unsigned integers start + length. It allows for even further batching by returning multiple such ranges in a single call.
Tangential: should tokens be u64? Otherwise we can't have more than 4gb unacknowledged. Or that's a reasonable constraint?
Was thinking the same and with bits reserved for a dmabuf id to allow multiple dmabufs in a single rx queue (future extension, but build the capability in now). e.g., something like a 37b offset (128GB dmabuf size), 19b length (large GRO), 8b dmabuf id (lots of dmabufs to a queue).
Agreed. Converting to 64b now sounds like a good forward looking revision.
The concept of IDing a dma-buf came up in a couple of different contexts. First, in the context of us giving the dma-buf ID to the user on recvmsg() to tell the user the data is in this specific dma-buf. The second context is here, to bind dma-bufs with multiple user-visible IDs to an rx queue.
My issue here is that I don't see anything in the struct dma_buf that can practically serve as an ID:
https://elixir.bootlin.com/linux/v6.6-rc7/source/include/linux/dma-buf.h#L30...
Actually, from the userspace, only the name of the dma-buf seems queryable. That's only unique if the user sets it as such. The dmabuf FD can't serve as an ID. For our use case we need to support 1 process doing the dma-buf bind via netlink, sharing the dma-buf FD to another process, and that process receives the data. In this case the FDs shown by the 2 processes may be different. Converting to 64b is a trivial change I can make now, but I'm not sure how to ID these dma-bufs. Suggestions welcome. I'm not sure the dma-buf guys will allow adding a new ID + APIs to query said dma-buf ID.
The API can be unique to this usage: e.g., add a dmabuf id to the netlink API. Userspace manages the ids (tells the kernel what value to use with an instance), the kernel validates no 2 dmabufs have the same id and then returns the value here.
Seems reasonable, will do.
On Wed, Nov 8, 2023 at 7:36 AM Edward Cree ecree.xilinx@gmail.com wrote:
On 06/11/2023 21:17, Stanislav Fomichev wrote:
I guess I'm just wondering whether other people have any suggestions here. Not sure Jonathan's way was better, but we fundamentally have two queues between the kernel and the userspace:
- userspace receiving tokens (recvmsg + magical flag)
- userspace refilling tokens (setsockopt + magical flag)
So having some kind of shared memory producer-consumer queue feels natural. And using 'classic' socket api here feels like a stretch, idk.
Do 'refilled tokens' (returned memory areas) get used for anything other than subsequent RX?
Hi Ed!
Not really, it's only the subsequent RX.
If not then surely the way to return a memory area in an io_uring idiom is just to post a new read sqe ('RX descriptor') pointing into it, rather than explicitly returning it with setsockopt.
We're interested in using this with regular TCP sockets, not necessarily io_uring. The io_uring interface to devmem TCP may very well use what you suggest and can drop the setsockopt.
(Being async means you can post lots of these, unlike recvmsg(), so you don't need any kernel management to keep the RX queue filled; it can just be all handled by the userland thus simplifying APIs overall.) Or I'm misunderstanding something?
-e
-- Thanks, Mina