On Tue, Jul 16, 2024 at 06:14:48PM +0800, Huan Yang wrote:
在 2024/7/16 17:31, Daniel Vetter 写道:
[你通常不会收到来自 daniel.vetter@ffwll.ch 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification%EF%BC%8C%E4%BB%A5%E4%BA%86%E8%...]
On Tue, Jul 16, 2024 at 10:48:40AM +0800, Huan Yang wrote:
I just research the udmabuf, Please correct me if I'm wrong.
在 2024/7/15 20:32, Christian König 写道:
Am 15.07.24 um 11:11 schrieb Daniel Vetter:
On Thu, Jul 11, 2024 at 11:00:02AM +0200, Christian König wrote:
Am 11.07.24 um 09:42 schrieb Huan Yang: > Some user may need load file into dma-buf, current > way is: > 1. allocate a dma-buf, get dma-buf fd > 2. mmap dma-buf fd into vaddr > 3. read(file_fd, vaddr, fsz) > This is too heavy if fsz reached to GB. You need to describe a bit more why that is to heavy. I can only assume you need to save memory bandwidth and avoid the extra copy with the CPU.
> This patch implement a feature called DMA_HEAP_IOCTL_ALLOC_READ_FILE. > User need to offer a file_fd which you want to load into > dma-buf, then, > it promise if you got a dma-buf fd, it will contains the file content. Interesting idea, that has at least more potential than trying to enable direct I/O on mmap()ed DMA-bufs.
The approach with the new IOCTL might not work because it is a very specialized use case.
But IIRC there was a copy_file_range callback in the file_operations structure you could use for that. I'm just not sure when and how that's used with the copy_file_range() system call.
I'm not sure any of those help, because internally they're all still based on struct page (or maybe in the future on folios). And that's the thing dma-buf can't give you, at least without peaking behind the curtain.
I think an entirely different option would be malloc+udmabuf. That essentially handles the impendence-mismatch between direct I/O and dma-buf on the dma-buf side. The downside is that it'll make the permanently pinned memory accounting and tracking issues even more apparent, but I guess eventually we do need to sort that one out.
Oh, very good idea! Just one minor correction: it's not malloc+udmabuf, but rather create_memfd()+udmabuf.
Hm right, it's create_memfd() + mmap(memfd) + udmabuf
And you need to complete your direct I/O before creating the udmabuf since that reference will prevent direct I/O from working.
udmabuf will pin all pages, so, if returned fd, can't trigger direct I/O (same as dmabuf). So, must complete read before pin it.
Why does pinning prevent direct I/O? I haven't tested, but I'd expect the rdma folks would be really annoyed if that's the case ...
But current way is use `memfd_pin_folios` to boost alloc and pin, so maybe need suit it.
I currently doubt that the udmabuf solution is suitable for our gigabyte-level read operations.
- The current mmap operation uses faulting, so frequent page faults will be
triggered during reads, resulting in a lot of context switching overhead.
- current udmabuf size limit is 64MB, even can change, maybe not good to
use in large size?
Yeah that's just a figleaf so we don't have to bother about the accounting issue.
- The migration and adaptation of the driver is also a challenge, and
currently, we are unable to control it.
Why does a udmabuf fd not work instead of any other dmabuf fd? That shouldn't matter for the consuming driver ...
Hmm, our production's driver provider by other oem. I see many of they implement
their own dma_buf_ops. These may not be generic and may require them to reimplement.
Yeah, for exporting a buffer object allocated by that driver. But any competent gles/vk stack also supports importing dma-buf, and that should work with udmabuf exactly the same way as with a dma-buf allocated from the system heap.
Perhaps implementing `copy_file_range` would be more suitable for us.
See my other mail, fundamentally these all rely on struct page being present, and dma-buf doesn't give you that. Which means you need to go below the dma-buf abstraction. And udmabuf is pretty much the thing for that, because it wraps normal struct page memory into a dmabuf.
Yes, udmabuf give this, I am very interested in whether the page provided by udmabuf can trigger direct I/O.
So, I'll give a test and report soon.
And copy_file_range on the underlying memfd might already work, I haven't checked though.
I have doubts.
I recently tested and found that I need to modify many places in vfs_copy_file_range in order to run the copy file range with DMA_BUF fd.(I have managed to get it working,
I'm talking about memfd, not dma-buf here. I think copy_file_range to dma-buf is as architecturally unsound as allowing O_DIRECT on the dma-buf mmap.
Cheers, Sima
but I don't think the implementation is good enough, so I can't provide the source code.)
Maybe memfd can work or not, let's give it a test.:)
Anyway, it's a good idea too. I currently need to focus on whether it can be achieved, as well as the performance comparison.
Cheers, Sima
Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch/