On Fri, Dec 19, 2025 at 05:24:36AM +0800, Barry Song wrote:
On Thu, Dec 18, 2025 at 9:55 PM Uladzislau Rezki urezki@gmail.com wrote:
On Thu, Dec 18, 2025 at 02:01:56PM +0100, David Hildenbrand (Red Hat) wrote:
On 12/15/25 06:30, Barry Song wrote:
From: Barry Song v-songbaohua@oppo.com
In many cases, the pages passed to vmap() may include high-order pages allocated with __GFP_COMP flags. For example, the systemheap often allocates pages in descending order: order 8, then 4, then 0. Currently, vmap() iterates over every page individually—even pages inside a high-order block are handled one by one.
This patch detects high-order pages and maps them as a single contiguous block whenever possible.
An alternative would be to implement a new API, vmap_sg(), but that change seems to be large in scope.
When vmapping a 128MB dma-buf using the systemheap, this patch makes system_heap_do_vmap() roughly 17× faster.
W/ patch: [ 10.404769] system_heap_do_vmap took 2494000 ns [ 12.525921] system_heap_do_vmap took 2467008 ns [ 14.517348] system_heap_do_vmap took 2471008 ns [ 16.593406] system_heap_do_vmap took 2444000 ns [ 19.501341] system_heap_do_vmap took 2489008 ns
W/o patch: [ 7.413756] system_heap_do_vmap took 42626000 ns [ 9.425610] system_heap_do_vmap took 42500992 ns [ 11.810898] system_heap_do_vmap took 42215008 ns [ 14.336790] system_heap_do_vmap took 42134992 ns [ 16.373890] system_heap_do_vmap took 42750000 ns
That's quite a speedup.
Cc: David Hildenbrand david@kernel.org Cc: Uladzislau Rezki urezki@gmail.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: John Stultz jstultz@google.com Cc: Maxime Ripard mripard@kernel.org Tested-by: Tangquan Zheng zhengtangquan@oppo.com Signed-off-by: Barry Song v-songbaohua@oppo.com
* diff with rfc: Many code refinements based on David's suggestions, thanks! Refine comment and changelog according to Uladzislau, thanks! rfc link: https://lore.kernel.org/linux-mm/20251122090343.81243-1-21cnbao@gmail.com/
mm/vmalloc.c | 45 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 39 insertions(+), 6 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 41dd01e8430c..8d577767a9e5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -642,6 +642,29 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end, return err; } +static inline int get_vmap_batch_order(struct page **pages,
- unsigned int stride, unsigned int max_steps, unsigned int idx)
+{
- int nr_pages = 1;
unsigned int, maybe
Right
Why are you initializing nr_pages when you overwrite it below?
Right, initializing nr_pages can be dropped.
- /*
- * Currently, batching is only supported in vmap_pages_range
- * when page_shift == PAGE_SHIFT.
I don't know the code so realizing how we go from page_shift to stride too me a second. Maybe only talk about stride here?
OTOH, is "stride" really the right terminology?
we calculate it as
stride = 1U << (page_shift - PAGE_SHIFT);
page_shift - PAGE_SHIFT should give us an "order". So is this a "granularity" in nr_pages?
This is the case where vmalloc() may realize that it has high-order pages and therefore calls vmap_pages_range_noflush() with a page_shift larger than PAGE_SHIFT. For vmap(), we take a pages array, so page_shift is always PAGE_SHIFT.
Again, I don't know this code, so sorry for the question.
To me "stride" also sounds unclear.
Thanks, David and Uladzislau. On second thought, this stride may be redundant, and it should be possible to drop it entirely. This results in the code below:
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 41dd01e8430c..3962bdcb43e5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -642,6 +642,20 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end, return err; } +static inline int get_vmap_batch_order(struct page **pages,
unsigned int max_steps, unsigned int idx)+{
- unsigned int nr_pages = compound_nr(pages[idx]);
- if (nr_pages == 1 || max_steps < nr_pages)
return 0;- if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
return compound_order(pages[idx]);- return 0;
+}
/*
- vmap_pages_range_noflush is similar to vmap_pages_range, but does not
- flush caches.
@@ -658,20 +672,35 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end, WARN_ON(page_shift < PAGE_SHIFT);
- /*
* For vmap(), users may allocate pages from high orders down to* order 0, while always using PAGE_SHIFT as the page_shift.* We first check whether the initial page is a compound page. If so,* there may be an opportunity to batch multiple pages together. if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||*/
page_shift == PAGE_SHIFT)
return vmap_small_pages_range_noflush(addr, end, prot, pages);(page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
Hm.. If first few pages are order-0 and the rest are compound then we do nothing.
- for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
- for (i = 0; i < nr; ) {
int err;unsigned int shift = page_shift;
err = vmap_range_noflush(addr, addr + (1UL << page_shift),
/** For vmap() cases, page_shift is always PAGE_SHIFT, even* if the pages are physically contiguous, they may still* be mapped in a batch.*/if (page_shift == PAGE_SHIFT)shift += get_vmap_batch_order(pages, nr - i, i);err = vmap_range_noflush(addr, addr + (1UL << shift), page_to_phys(pages[i]), prot,
page_shift);
if (err) return err;shift);
addr += 1UL << page_shift;
addr += 1UL << shift; }i += 1U << shift;return 0;
Does this look clearer?
The concern is we mix it with a huge page mapping path. If we want to batch v-mapping for page_shift == PAGE_SHIFT case, where "pages" array may contain compound pages(folio)(corner case to me), i think we should split it.
-- Uladzislau Rezki