Re: [PATCH net-next v3 0/6] Device memory TCP TX

5 Feb 2025


      On 02/04, Samiullah Khawaja wrote:
...
On Tue, Feb 4, 2025 at 11:43 AM Stanislav Fomichev stfomichev@gmail.com wrote:
...
On 02/04, Mina Almasry wrote:
...
On Tue, Feb 4, 2025 at 10:06 AM Stanislav Fomichev stfomichev@gmail.com wrote:
...
On 02/04, Mina Almasry wrote:
...
On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni pabeni@redhat.com wrote:
...
On 2/3/25 11:39 PM, Mina Almasry wrote:
> The TX path had been dropped from the Device Memory TCP patch series
> post RFCv1 [1], to make that series slightly easier to review. This
> series rebases the implementation of the TX path on top of the
> net_iov/netmem framework agreed upon and merged. The motivation for
> the feature is thoroughly described in the docs & cover letter of the
> original proposal, so I don't repeat the lengthy descriptions here, but
> they are available in [1].
>
> Sending this series as RFC as the winder closure is immenient. I plan on
> reposting as non-RFC once the tree re-opens, addressing any feedback
> I receive in the meantime.
I guess you should drop this paragraph.
> Full outline on usage of the TX path is detailed in the documentation
> added in the first patch.
>
> Test example is available via the kselftest included in the series as well.
>
> The series is relatively small, as the TX path for this feature largely
> piggybacks on the existing MSG_ZEROCOPY implementation.
It looks like no additional device level support is required. That is
IMHO so good up to suspicious level :)
It is correct no additional device level support is required. I don't
have any local changes to my driver to make this work. I think Stan
on-list was able to run the TX path (he commented on fixes to the test
but didn't say it doesn't work :D) and one other person was able to
run it offlist.
For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/
I have similar internal patch for mlx5 (will share after RX part gets
in). I agree that it seems like gve_unmap_packet needs some work to be more
careful to not unmap NIOVs (if you were testing against gve).
Hmm. I think you're right. We ran into a similar issue with the RX
path. The RX path worked 'fine' on initial merge, but it was passing
dmabuf dma-addrs to the dma-mapping API which Jason later called out
to be unsafe. The dma-mapping API calls with dmabuf dma-addrs will
boil down into no-ops for a lot of setups I think which is why I'm not
running into any issues in testing, but upon closer look, I think yes,
we need to make sure the driver doesn't end up passing these niov
dma-addrs to functions like dma_unmap_*() and dma_sync_*().
Stan, do you run into issues (crashes/warnings/bugs) in your setup
when the driver tries to unmap niovs? Or did you implement these
changes purely for safety?
I don't run into any issues with those unmaps in place, but I'm running x86
with iommu bypass (and as you mention in the other thread, those
calls are no-ops in this case).
The dma_addr from dma-buf should never enter dma_* APIs. dma-bufs
exporters have their own implementation of these ops and they could be
no-op for identity mappings or when iommu is disabled (in a VM? with
no IOMMU enabled GPA=IOVA). so if we really want to map/unmap/sync
these addresses the dma-buf APIs should be used to do that. Maybe some
glue with a memory provider is required for these net_iovs? I think
the safest option with these is that mappings are never unmapped
manually by driver until the dma_buf_unmap_attachment is called during
unbinding? But maybe that complicates things for io_uring?
Correct, we don't want to call dma_* APIs on NIOVs, but currently we
do (unmap on tx completion). I mentioned [0] in another thread, we need
something similar for gve (and eventually mlx). skb_frag_dma_map hides
the mapping, but the unmapping unconditionally explicitly calls dma_ APIs
(in most drivers I've looked at).
0: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH net-next v3 0/6] Device memory TCP TX