On Fri, Jun 06, 2025 at 01:20:48PM +0200, Christian König wrote:
dmabuf acts as a driver and shouldn't be handled by VFS, so I made dmabuf implement copy_file_range callbacks to support direct I/O zero-copy. I'm open to both approaches. What's the preference of VFS experts?
That would probably be illegal. Using the sg_table in the DMA-buf implementation turned out to be a mistake.
Two thing here that should not be directly conflated. Using the sg_table was a huge mistake, and we should try to move dmabuf to switch that to a pure dma_addr_t/len array now that the new DMA API supporting that has been merged. Is there any chance the dma-buf maintainers could start to kick this off? I'm of course happy to assist.
But that notwithstanding, dma-buf is THE buffer sharing mechanism in the kernel, and we should promote it instead of reinventing it badly. And there is a use case for having a fully DMA mapped buffer in the block layer and I/O path, especially on systems with an IOMMU. So having an iov_iter backed by a dma-buf would be extremely helpful. That's mostly lib/iov_iter.c code, not VFS, though.
The question Christoph raised was rather why is your CPU so slow that walking the page tables has a significant overhead compared to the actual I/O?
Yes, that's really puzzling and should be addressed first.