Sorry for the late reply, holidays ...
Did you ever try allocating a larger range with a single alloc_contig_range() call, that possibly has to migrate multiple hugetlb folios in one go (and maybe just allocates one of the just-freed hugetlb folios as migration target)?
I have tried using a single alloc_contig_range() call to allocate a larger contiguous range, and it works properly. This is because during the period between __alloc_contig_migrate_range() and isolate_freepages_range(), no one allocates a hugetlb folio from the free hugetlb pool.
Did you trigger the following as well?
alloc_contig_range() that covers multiple in-use hugetlb pages, like
[ huge 0 ] [ huge 1 ] [ huge 2 ] [ huge 3 ]
I assume the following happens:
To migrate huge 0, we have to allocate a fresh page from the buddy. After migration, we return now-free huge 0 to the pool.
To migrate huge 1, we can just grab now-free huge 0 from the pool, and not allocate a fresh one from the buddy.
At least that's my impression when reading alloc_migration_target()->alloc_hugetlb_folio_nodemask().
Or is for some reason available_huge_pages()==false and we always end up in alloc_migrate_hugetlb_folio()->alloc_fresh_hugetlb_folio()?
Sorry for the stupid questions, the code is complicated, and I cannot see how this would work.