On Tue, Jul 16, 2024 at 06:14:48PM +0800, Huan Yang wrote:
在 2024/7/16 17:31, Daniel Vetter 写道:
[你通常不会收到来自 daniel.vetter@ffwll.ch 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification%EF%BC%8C%E4%BB%A5%E4%BA%86%E8%...]
On Tue, Jul 16, 2024 at 10:48:40AM +0800, Huan Yang wrote:
I just research the udmabuf, Please correct me if I'm wrong.
在 2024/7/15 20:32, Christian König 写道:
Am 15.07.24 um 11:11 schrieb Daniel Vetter:
On Thu, Jul 11, 2024 at 11:00:02AM +0200, Christian König wrote:
Am 11.07.24 um 09:42 schrieb Huan Yang: > Some user may need load file into dma-buf, current > way is: > 1. allocate a dma-buf, get dma-buf fd > 2. mmap dma-buf fd into vaddr > 3. read(file_fd, vaddr, fsz) > This is too heavy if fsz reached to GB. You need to describe a bit more why that is to heavy. I can only assume you need to save memory bandwidth and avoid the extra copy with the CPU.
> This patch implement a feature called DMA_HEAP_IOCTL_ALLOC_READ_FILE. > User need to offer a file_fd which you want to load into > dma-buf, then, > it promise if you got a dma-buf fd, it will contains the file content. Interesting idea, that has at least more potential than trying to enable direct I/O on mmap()ed DMA-bufs.
The approach with the new IOCTL might not work because it is a very specialized use case.
But IIRC there was a copy_file_range callback in the file_operations structure you could use for that. I'm just not sure when and how that's used with the copy_file_range() system call.
I'm not sure any of those help, because internally they're all still based on struct page (or maybe in the future on folios). And that's the thing dma-buf can't give you, at least without peaking behind the curtain.
I think an entirely different option would be malloc+udmabuf. That essentially handles the impendence-mismatch between direct I/O and dma-buf on the dma-buf side. The downside is that it'll make the permanently pinned memory accounting and tracking issues even more apparent, but I guess eventually we do need to sort that one out.
Oh, very good idea! Just one minor correction: it's not malloc+udmabuf, but rather create_memfd()+udmabuf.
Hm right, it's create_memfd() + mmap(memfd) + udmabuf
And you need to complete your direct I/O before creating the udmabuf since that reference will prevent direct I/O from working.
udmabuf will pin all pages, so, if returned fd, can't trigger direct I/O (same as dmabuf). So, must complete read before pin it.
Why does pinning prevent direct I/O? I haven't tested, but I'd expect the rdma folks would be really annoyed if that's the case ...
But current way is use `memfd_pin_folios` to boost alloc and pin, so maybe need suit it.
I currently doubt that the udmabuf solution is suitable for our gigabyte-level read operations.
- The current mmap operation uses faulting, so frequent page faults will be
triggered during reads, resulting in a lot of context switching overhead.
- current udmabuf size limit is 64MB, even can change, maybe not good to
use in large size?
Yeah that's just a figleaf so we don't have to bother about the accounting issue.
- The migration and adaptation of the driver is also a challenge, and
currently, we are unable to control it.
Why does a udmabuf fd not work instead of any other dmabuf fd? That shouldn't matter for the consuming driver ...
Hmm, our production's driver provider by other oem. I see many of they implement
their own dma_buf_ops. These may not be generic and may require them to reimplement.
Yeah, for exporting a buffer object allocated by that driver. But any competent gles/vk stack also supports importing dma-buf, and that should work with udmabuf exactly the same way as with a dma-buf allocated from the system heap.
Perhaps implementing `copy_file_range` would be more suitable for us.
See my other mail, fundamentally these all rely on struct page being present, and dma-buf doesn't give you that. Which means you need to go below the dma-buf abstraction. And udmabuf is pretty much the thing for that, because it wraps normal struct page memory into a dmabuf.
Yes, udmabuf give this, I am very interested in whether the page provided by udmabuf can trigger direct I/O.
So, I'll give a test and report soon.
And copy_file_range on the underlying memfd might already work, I haven't checked though.
I have doubts.
I recently tested and found that I need to modify many places in vfs_copy_file_range in order to run the copy file range with DMA_BUF fd.(I have managed to get it working,
I'm talking about memfd, not dma-buf here. I think copy_file_range to dma-buf is as architecturally unsound as allowing O_DIRECT on the dma-buf mmap.
Cheers, Sima
but I don't think the implementation is good enough, so I can't provide the source code.)
Maybe memfd can work or not, let's give it a test.:)
Anyway, it's a good idea too. I currently need to focus on whether it can be achieved, as well as the performance comparison.
Cheers, Sima
Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch/
On Wed, Jul 17, 2024 at 05:15:07PM +0200, Daniel Vetter wrote:
I'm talking about memfd, not dma-buf here. I think copy_file_range to dma-buf is as architecturally unsound as allowing O_DIRECT on the dma-buf mmap.
copy_file_range only work inside the same file system anyway, so it is completely irrelevant here.
What should work just fine is using sendfile (or splice if you like it complicated) to write TO the dma buf. That just iterates over the page cache on the source file and calls ->write_iter from the page cache pages. Of course that requires that you actually implement ->write_iter, but given that dmabufs support mmaping there I can't see why you should not be able to write to it.
Reading FROM the dma buf in that fashion should also work if you provide a ->read_iter wire up ->splice_read to copy_splice_read so that it doesn't require any page cache.
linaro-mm-sig@lists.linaro.org