On 2/3/25 11:39 PM, Mina Almasry wrote:
Augment dmabuf binding to be able to handle TX. Additional to all the RX binding, we also create tx_vec needed for the TX path.
Provide API for sendmsg to be able to send dmabufs bound to this device:
- Provide a new dmabuf_tx_cmsg which includes the dmabuf to send from.
- MSG_ZEROCOPY with SCM_DEVMEM_DMABUF cmsg indicates send from dma-buf.
Devmem is uncopyable, so piggyback off the existing MSG_ZEROCOPY implementation, while disabling instances where MSG_ZEROCOPY falls back to copying.
We additionally pipe the binding down to the new zerocopy_fill_skb_from_devmem which fills a TX skb with net_iov netmems instead of the traditional page netmems.
We also special case skb_frag_dma_map to return the dma-address of these dmabuf net_iovs instead of attempting to map pages.
Based on work by Stanislav Fomichev sdf@fomichev.me. A lot of the meat of the implementation came from devmem TCP RFC v1[1], which included the TX path, but Stan did all the rebasing on top of netmem/net_iov.
Cc: Stanislav Fomichev sdf@fomichev.me Signed-off-by: Kaiyuan Zhang kaiyuanz@google.com Signed-off-by: Mina Almasry almasrymina@google.com
Very minor nit: you unexpectedly leaved a lot of empty lines after the SoB.
[...] @@ -240,13 +249,23 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
* binding can be much more flexible than that. We may be able to * allocate MTU sized chunks here. Leave that for future work... */
- binding->chunk_pool =
gen_pool_create(PAGE_SHIFT, dev_to_node(&dev->dev));
- binding->chunk_pool = gen_pool_create(PAGE_SHIFT,
if (!binding->chunk_pool) { err = -ENOMEM; goto err_unmap; }dev_to_node(&dev->dev));
- if (direction == DMA_TO_DEVICE) {
binding->tx_vec = kvmalloc_array(dmabuf->size / PAGE_SIZE,
sizeof(struct net_iov *),
GFP_KERNEL);
if (!binding->tx_vec) {
err = -ENOMEM;
goto err_free_chunks;
It looks like the later error paths (in the for_each_sgtable_dma_sg() loop) could happen even for 'direction == DMA_TO_DEVICE', so I guess an additional error label is needed to clean tx_vec on such paths.
/P