On 6 Feb 2025, at 3:01, Andrew Morton wrote:
On Tue, 4 Feb 2025 22:14:10 -0500 Zi Yan ziy@nvidia.com wrote:
This patchset adds a new buddy allocator like (or non-uniform) large folio split to reduce the total number of after-split folios, the amount of memory needed for multi-index xarray split, and keep more large folios after a split.
It would be useful (vital, really) to provide some measurements which help others understand the magnitude of these resource savings, please.
Hi Andrew,
Can you please drop this series for now? I find that, after your above request, I misunderstood how xas_split_alloc() and xas_split() works in xarray, thus, my current implementation allocates more than enough xa_node during non-uniform split, although the excessive ones are freed at the end. It defeats the purpose of reducing memory consumption of multi-index xarray split, even if folio_split() has no function issue AFAICT. I am working on a better implementation that might require new xarray operations. I will post it as v7 later. I really appreciate that you asked about more info above. :)
More details on memory saving for multi-index xarray split during non-uniform split compared to existing uniform split (I will add this to commit log in the next version):
Existing uniform split requires 2^(order % XA_CHUNK_SHIFT) xa_node allocations during split, when the folio needs to be split to order-0. But non-uniform split only requires at most 1 xa_node allocation. For example, to split an order-9 folio, 8 xa_nodes are needed for uniform split, since the folio takes 8 multi-index slots in the xarray. But for non-uniform split, only the slot containing the given struct page needs a xa_node after the split. There will be a 7 xa_node saving.
Hi Matthew,
Do you mind checking my statement above on xarray memory saving? And correct me if I miss anything. Thanks.
Best Regards, Yan, Zi