Subject: Re: Excessive memory usage when infiniband config is enabled
On Tue, May 07, 2024 at 05:24:51PM +0200, Zhu Yanjun wrote:
在 2024/5/7 15:32, Konstantin Taranov 写道:
Hello Leon,
I feel that it's a bug because I don't understand why is this module/option allocating 6GB of RAM without any explicit configuration or
usage from us.
It's also worth mentioning that we are using the default linux-image from Debian bookworm, and it took us a long time to understand the reason behind this memory increase by bisecting the
kernel's config file.
Moreover the documentation of the module doesn't mention anything regarding additional memory usage, we're talking about an increase of 6Gb which is huge since we're not using the option. So is that an expected behavior, to have this much increase in the memory consumption, when activating the RDMA option even if we're not using it ? If that's the case, perhaps it would be good to mention this in the documentation.
Thank you
Hi Brian,
I do not think it is a bug. The high memory usage seems to come from these
lines:
rsrc_size = irdma_calc_mem_rsrc_size(rf); rf->mem_rsrc = vzalloc(rsrc_size);
Exactly. The memory usage is related with the number of QP. When on irdma, the Queue Pairs is 4092, Completion Queues is 8189, the memory usage is about 4194302.
The command "modprobe irdma limits_sel" will change QP numbers. 0 means minimum, up to 124 QPs.
Please use the command "modprobe irdma limits_sel=0" to make tests. Please let us know the test results.
It seems like a really unfortunate design choice in this driver to not have dynamic memory allocation.
Burning 6G on every server that has your HW, regardless if any RDMA apps are run, seems completely excessive.
So the driver requires to pre-allocate backing pages for HW context objects during device initialization. At least for the x722 and e800 series product lines.
And the amount of memory allocated is proportional to the max QP (primarily) setup for the function.
One option is to set a lower default profile upon driver loading; which will reduce the memory footprint; but exposes lower QP and other verb resources per ib_device. And provide users with a devlink knob to choose a larger/smaller profile as they see fit.
This is sort of what limits_sel module parameter Yanjun suggested realizes, but it is not available in the in-tree driver.
Between, what is the specific Intel NIC model in use? lspci -vv | grep -E 'Eth.*Intel|Product'
Shiraz