The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e (mm, vmalloc: fix high order __GFP_NOFAIL allocations), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan.Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org --- mm/vmalloc.c | 11 ++--------- mm/vmalloc.c.rej | 10 ++++++++++ 2 files changed, 12 insertions(+), 9 deletions(-) create mode 100644 mm/vmalloc.c.rej
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..af2de36549d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order); - if (unlikely(!page)) { - if (!nofail) - break; - - /* fall back to the zero order allocations */ - alloc_gfp |= __GFP_NOFAIL; - order = 0; - continue; - } + if (unlikely(!page)) + break;
/* * Higher order allocations must be able to be treated as diff --git a/mm/vmalloc.c.rej b/mm/vmalloc.c.rej new file mode 100644 index 000000000000..c28017088319 --- /dev/null +++ b/mm/vmalloc.c.rej @@ -0,0 +1,10 @@ +--- mm/vmalloc.c ++++ mm/vmalloc.c +@@ -3000,6 +3005,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, + unsigned int nr_allocated = 0; + gfp_t alloc_gfp = gfp; + bool nofail = false; ++ bool fallback = false; + struct page *page; + int i; + --- Baoquan suggests set page_shift to 0 if fallback in (2 and concern about performance of retry with order-0. But IMO with retry, - Save memory usage if high order allocation failed. - Keep consistancy with align and page-shift. - make use of bulk allocator with order-0
[2] https://lore.kernel.org/lkml/20240725035318.471-1-hailong.liu@oppo.com/ -- 2.34.1
On Thu 08-08-24 20:00:58, Hailong Liu wrote:
The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e (mm, vmalloc: fix high order __GFP_NOFAIL allocations), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan.Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org
mm/vmalloc.c | 11 ++--------- mm/vmalloc.c.rej | 10 ++++++++++
What is this?
2 files changed, 12 insertions(+), 9 deletions(-) create mode 100644 mm/vmalloc.c.rej
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..af2de36549d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order);
if (unlikely(!page)) {
if (!nofail)
break;
/* fall back to the zero order allocations */
alloc_gfp |= __GFP_NOFAIL;
order = 0;
continue;
}
if (unlikely(!page))
break;
This just makes the NOFAIL allocation fail. So this is not a correct fix.
/* * Higher order allocations must be able to be treated as
On Fri 09-08-24 11:30:32, Michal Hocko wrote:
On Thu 08-08-24 20:00:58, Hailong Liu wrote:
The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e (mm, vmalloc: fix high order __GFP_NOFAIL allocations), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan.Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org
mm/vmalloc.c | 11 ++--------- mm/vmalloc.c.rej | 10 ++++++++++
What is this?
2 files changed, 12 insertions(+), 9 deletions(-) create mode 100644 mm/vmalloc.c.rej
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..af2de36549d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order);
if (unlikely(!page)) {
if (!nofail)
break;
/* fall back to the zero order allocations */
alloc_gfp |= __GFP_NOFAIL;
order = 0;
continue;
}
if (unlikely(!page))
break;
This just makes the NOFAIL allocation fail. So this is not a correct fix.
OK, I can see a newer version
linux-stable-mirror@lists.linaro.org