On 4/4/22 03:41, David Hildenbrand wrote:
On 01.04.22 19:23, Mike Kravetz wrote:
On 4/1/22 03:43, David Hildenbrand wrote:
On 01.04.22 12:12, Peng Liu wrote:
Hugepages can be specified to pernode since "hugetlbfs: extend the definition of hugepages parameter to support node allocation", but the following problem is observed.
Confusing behavior is observed when both 1G and 2M hugepage is set after "numa=off". cmdline hugepage settings: hugepagesz=1G hugepages=0:3,1:3 hugepagesz=2M hugepages=0:1024,1:1024 results: HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
Furthermore, confusing behavior can be also observed when invalid node behind valid node.
To fix this, hugetlb_hstate_alloc_pages should be called even when hugepages_setup going to invalid.
Shouldn't we bail out if someone requests node-specific allocations but we are not running with NUMA?
I thought about this as well, and could not come up with a good answer. Certainly, nobody SHOULD specify both 'numa=off' and ask for node specific allocations on the same command line. I would have no problem bailing out in such situations. But, I think that would also require the hugetlb command line processing to look for such situations.
Yes. Right now I see
if (tmp >= nr_online_nodes) goto invalid;
Which seems a little strange, because IIUC, it's the number of online nodes, which is completely wrong with a sparse online bitmap. Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming that "node < 2" is valid is wrong.
Why don't we check for node_online() and bail out if that is not the case? Is it too early for that check? But why does comparing against nr_online_nodes() work, then?
Having that said, I'm not sure if all usage of nr_online_nodes in mm/hugetlb.c is wrong, with a sparse online bitmap. Outside of that, it's really just used for "nr_online_nodes > 1". I might be wrong, though.
I think you are correct. My bad for not being more thorough in reviewing the original patch that added this code. My incorrect assumption was that a sparse node map was only possible via offline operations which could not happen this early in boot. I now see that a sparse map can be presented by fw/bios/etc. So, yes I do believe we need to check for online nodes.