From: Jian-Hong Pan
Sent: 26 July 2019 07:18
...
While allocating all 512 buffers in one block (just over 4MB) is probably not a good idea, you may need to allocated (and dma map) then in groups.
Thanks for reviewing. But got questions here to double confirm the idea. According to original code, it allocates 512 skbs for RX ring and dma mapping one by one. So, the new code allocates memory buffer 512 times to get 512 buffer arrays. Will the 512 buffers arrays be in one block? Do you mean aggregate the buffers as a scatterlist and use dma_map_sg?
If you malloc a buffer of size (8192+32) the allocator will either round it up to a whole number of (often 4k) pages or to a power of 2 of pages - so either 12k of 16k. I think the Linux allocator does the latter. Some of the allocators also 'steal' a bit from the front of the buffer for 'red tape'.
OTOH malloc the space 15 buffers and the allocator will round the 15*(8192 + 32) up to 32*4k - and you waste under 8k across all the buffers.
You then dma_map the large buffer and split into the actual rx buffers. Repeat until you've filled the entire ring. The only complication is remembering the base address (and size) for the dma_unmap and free. Although there is plenty of padding to extend the buffer structure significantly without using more memory. Allocate in 15's and you (probably) have 512 bytes per buffer. Allocate in 31's and you have 256 bytes.
The problem is that larger allocates are more likely to fail (especially if the system has been running for some time). So you almost certainly want to be able to fall back to smaller allocates even though they use more memory.
I also wonder if you actually need 512 8k rx buffers to cover interrupt latency? I've not done any measurements for 20 years!
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)