Re: [PATCH V7] mm, compaction: don't use ALLOC_CMA for unmovable allocations

17 Dec 2024


      Hello Yangge,
On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote:
...
From: yangge yangge1116@126.com
Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags
in __compaction_suitable()") allow compaction to proceed when free
pages required for compaction reside in the CMA pageblocks, it's
possible that __compaction_suitable() always returns true, and in
some cases, it's not acceptable.
There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
of memory. I have configured 16GB of CMA memory on each NUMA node,
and starting a 32GB virtual machine with device passthrough is
extremely slow, taking almost an hour.
During the start-up of the virtual machine, it will call
pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
Long term GUP cannot allocate memory from CMA area, so a maximum
of 16 GB of no-CMA memory on a NUMA node can be used as virtual
machine memory. Since there is 16G of free CMA memory on the NUMA
node, watermark for order-0 always be met for compaction, so
__compaction_suitable() always returns true, even if the node is
unable to allocate non-CMA memory for the virtual machine.
For costly allocations, because __compaction_suitable() always
returns true, __alloc_pages_slowpath() can't exit at the appropriate
place, resulting in excessively long virtual machine startup times.
Call trace:
__alloc_pages_slowpath
    if (compact_result == COMPACT_SKIPPED ||
        compact_result == COMPACT_DEFERRED)
        goto nopage; // should exit __alloc_pages_slowpath() from here
Other unmovable alloctions, like dma_buf, which can be large in a
Linux system, are also unable to allocate memory from CMA, and these
allocations suffer from the same problems described above. In order
to quickly fall back to remote node, we should remove ALLOC_CMA both
in __compaction_suitable() and __isolate_free_page() for unmovable
alloctions. After this fix, starting a 32GB virtual machine with
device passthrough takes only a few seconds.
The symptom is obviously bad, but I don't understand this fix.
The reason we do ALLOC_CMA is that, even for unmovable allocations,
you can create space in non-CMA space by moving migratable pages over
to CMA space. This is not a property we want to lose. But I also don't
see how it would interfere with your scenario.
There is the compaction_suitable() check in should_compact_retry(),
but that only applies when COMPACT_SKIPPED. IOW, it should only happen
when compaction_suitable() just now returned false. IOW, a race
condition. Which is why it's also not subject to limited retries.
What's the exact condition that traps the allocator inside the loop?

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH V7] mm, compaction: don't use ALLOC_CMA for unmovable allocations