On Mon, Jun 16, 2025 at 04:02:30PM +0100, Giovanni Cabiddu wrote:
This level of performance is observed in userspace, where it is possible to (1) batch requests to amortize MMIO overhead (e.g., multiple requests per write), (2) submit requests asynchronously, (3) use flat buffers instead of scatter-gather lists, and (4) rely on polling rather than interrupts.
So is batching a large number of 4K requests requests sufficient to achieve the maximum throughput? Or does it require physically contiguous memory much greater than 4K in size?
Cheers,