On Wed, Jun 23, 2021 at 10:57:35AM +0200, Christian König wrote:
No it isn't. It makes devices depend on allocating struct pages for their BARs which is not necessary nor desired.
Which dramatically reduces the cost of establishing DMA mappings, a loop of dma_map_resource() is very expensive.
Yeah, but that is perfectly ok. Our BAR allocations are either in chunks of at least 2MiB or only a single 4KiB page.
And very small apparently
Allocating a struct pages has their use case, for example for exposing VRAM as memory for HMM. But that is something very specific and should not limit PCIe P2P DMA in general.
Sure, but that is an ideal we are far from obtaining, and nobody wants to work on it prefering to do hacky hacky like this.
If you believe in this then remove the scatter list from dmabuf, add a new set of dma_map* APIs to work on physical addresses and all the other stuff needed.
Yeah, that's what I totally agree on. And I actually hoped that the new P2P work for PCIe would go into that direction, but that didn't materialized.
It is a lot of work and the only gain is to save a bit of memory for struct pages. Not a very big pay off.
But allocating struct pages for PCIe BARs which are essentially registers and not memory is much more hacky than the dma_resource_map() approach.
It doesn't really matter. The pages are in a special zone and are only being used as handles for the BAR memory.
By using PCIe P2P we want to avoid the round trip to the CPU when one device has filled the ring buffer and another device must be woken up to process it.
Sure, we all have these scenarios, what is inside the memory doesn't realy matter. The mechanism is generic and the struct pages don't care much if they point at something memory-like or at something register-like.
They are already in big trouble because you can't portably use CPU instructions to access them anyhow.
Jason