On 02/04, Samiullah Khawaja wrote:
On Tue, Feb 4, 2025 at 11:43 AM Stanislav Fomichev stfomichev@gmail.com wrote:
On 02/04, Mina Almasry wrote:
On Tue, Feb 4, 2025 at 10:06 AM Stanislav Fomichev stfomichev@gmail.com wrote:
On 02/04, Mina Almasry wrote:
On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni pabeni@redhat.com wrote:
On 2/3/25 11:39 PM, Mina Almasry wrote: > The TX path had been dropped from the Device Memory TCP patch series > post RFCv1 [1], to make that series slightly easier to review. This > series rebases the implementation of the TX path on top of the > net_iov/netmem framework agreed upon and merged. The motivation for > the feature is thoroughly described in the docs & cover letter of the > original proposal, so I don't repeat the lengthy descriptions here, but > they are available in [1]. > > Sending this series as RFC as the winder closure is immenient. I plan on > reposting as non-RFC once the tree re-opens, addressing any feedback > I receive in the meantime.
I guess you should drop this paragraph.
> Full outline on usage of the TX path is detailed in the documentation > added in the first patch. > > Test example is available via the kselftest included in the series as well. > > The series is relatively small, as the TX path for this feature largely > piggybacks on the existing MSG_ZEROCOPY implementation.
It looks like no additional device level support is required. That is IMHO so good up to suspicious level :)
It is correct no additional device level support is required. I don't have any local changes to my driver to make this work. I think Stan on-list was able to run the TX path (he commented on fixes to the test but didn't say it doesn't work :D) and one other person was able to run it offlist.
For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ I have similar internal patch for mlx5 (will share after RX part gets in). I agree that it seems like gve_unmap_packet needs some work to be more careful to not unmap NIOVs (if you were testing against gve).
Hmm. I think you're right. We ran into a similar issue with the RX path. The RX path worked 'fine' on initial merge, but it was passing dmabuf dma-addrs to the dma-mapping API which Jason later called out to be unsafe. The dma-mapping API calls with dmabuf dma-addrs will boil down into no-ops for a lot of setups I think which is why I'm not running into any issues in testing, but upon closer look, I think yes, we need to make sure the driver doesn't end up passing these niov dma-addrs to functions like dma_unmap_*() and dma_sync_*().
Stan, do you run into issues (crashes/warnings/bugs) in your setup when the driver tries to unmap niovs? Or did you implement these changes purely for safety?
I don't run into any issues with those unmaps in place, but I'm running x86 with iommu bypass (and as you mention in the other thread, those calls are no-ops in this case).
The dma_addr from dma-buf should never enter dma_* APIs. dma-bufs exporters have their own implementation of these ops and they could be no-op for identity mappings or when iommu is disabled (in a VM? with no IOMMU enabled GPA=IOVA). so if we really want to map/unmap/sync these addresses the dma-buf APIs should be used to do that. Maybe some glue with a memory provider is required for these net_iovs? I think the safest option with these is that mappings are never unmapped manually by driver until the dma_buf_unmap_attachment is called during unbinding? But maybe that complicates things for io_uring?
Correct, we don't want to call dma_* APIs on NIOVs, but currently we do (unmap on tx completion). I mentioned [0] in another thread, we need something similar for gve (and eventually mlx). skb_frag_dma_map hides the mapping, but the unmapping unconditionally explicitly calls dma_ APIs (in most drivers I've looked at).
0: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/