[RESEND PATCH v1] mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0

List overview All Threads
Download

newer

older

[PATCH 1/2] drm/v3d: Disable...

[PATCH] vfs: fix race between...

Hailong Liu

8 Aug 2024 8 Aug '24

12:19 p.m.

The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.

Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens

We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.

Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org --- mm/vmalloc.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..af2de36549d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order); - if (unlikely(!page)) { - if (!nofail) - break; - - /* fall back to the zero order allocations */ - alloc_gfp |= __GFP_NOFAIL; - order = 0; - continue; - } + if (unlikely(!page)) + break;

/* * Higher order allocations must be able to be treated as --- Sorry for fat fingers. with .rej file. resend this.

Baoquan suggests set page_shift to 0 if fallback in (2 and concern about performance of retry with order-0. But IMO with retry, - Save memory usage if high order allocation failed. - Keep consistancy with align and page-shift. - make use of bulk allocator with order-0

[2] https://lore.kernel.org/lkml/20240725035318.471-1-hailong.liu@oppo.com/ -- 2.30.0

Show replies by date

Baoquan He

8 Aug 8 Aug

1:01 p.m.

On 08/08/24 at 08:19pm, Hailong Liu wrote:

...

The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.

Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens

We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.

Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org

mm/vmalloc.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..af2de36549d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order);
if (unlikely(!page)) {
	if (!nofail)
		break;
	/* fall back to the zero order allocations */
	alloc_gfp |= __GFP_NOFAIL;
	order = 0;
	continue;
}
if (unlikely(!page))
	break;

This looks great, thanks.

Reviewed-by: Baoquan He bhe@redhat.com

Uladzislau Rezki

2:57 p.m.

On Thu, Aug 08, 2024 at 08:19:56PM +0800, Hailong Liu wrote:

...

The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.

Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens

We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.

Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org

mm/vmalloc.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..af2de36549d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order);
if (unlikely(!page)) {
	if (!nofail)
		break;
	/* fall back to the zero order allocations */
	alloc_gfp |= __GFP_NOFAIL;
	order = 0;
	continue;
}
if (unlikely(!page))
	break;
/*

Higher order allocations must be able to be treated as
Sorry for fat fingers. with .rej file. resend this.

Baoquan suggests set page_shift to 0 if fallback in (2 and concern about performance of retry with order-0. But IMO with retry,

Save memory usage if high order allocation failed.

Keep consistancy with align and page-shift.

make use of bulk allocator with order-0

[2] https://lore.kernel.org/lkml/20240725035318.471-1-hailong.liu@oppo.com/

2.30.0

Makes sense to me.

Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com

Thank you!

-- Uladzislau Rezki

Barry Song

9:05 p.m.

On Fri, Aug 9, 2024 at 12:20 AM Hailong Liu hailong.liu@oppo.com wrote:

...

The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.

Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens

We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.

Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org

Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

...

mm/vmalloc.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..af2de36549d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order);
          if (unlikely(!page)) {
                  if (!nofail)
                          break;
                  /* fall back to the zero order allocations */
                  alloc_gfp |= __GFP_NOFAIL;
                  order = 0;
                  continue;
          }
          if (unlikely(!page))
                  break;

          /*
           * Higher order allocations must be able to be treated as
Sorry for fat fingers. with .rej file. resend this.

Baoquan suggests set page_shift to 0 if fallback in (2 and concern about performance of retry with order-0. But IMO with retry,

Save memory usage if high order allocation failed.

Keep consistancy with align and page-shift.

make use of bulk allocator with order-0

[2] https://lore.kernel.org/lkml/20240725035318.471-1-hailong.liu@oppo.com/

2.30.0

Michal Hocko

9 Aug 9 Aug

9:33 a.m.

On Fri 09-08-24 09:05:05, Barry Song wrote:

...

On Fri, Aug 9, 2024 at 12:20 AM Hailong Liu hailong.liu@oppo.com wrote:

...
The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.

Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens

We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.

Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org

Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

-- Michal Hocko SUSE Labs

Uladzislau Rezki

9:41 a.m.

On Fri, Aug 09, 2024 at 11:33:06AM +0200, Michal Hocko wrote:

...

On Fri 09-08-24 09:05:05, Barry Song wrote:

...
On Fri, Aug 9, 2024 at 12:20 AM Hailong Liu hailong.liu@oppo.com wrote:

...
The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.

Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens

We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.

Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan Zheng zhengtangquan@oppo.com Cc: stable@vger.kernel.org CC: Barry Song 21cnbao@gmail.com CC: Baoquan He bhe@redhat.com CC: Matthew Wilcox willy@infradead.org

Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

-- Uladzislau Rezki

Andrew Morton

16 Aug 16 Aug

5:07 a.m.

On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...

...
...
Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Uladzislau Rezki

7:19 a.m.

On Thu, Aug 15, 2024 at 10:07:09PM -0700, Andrew Morton wrote:

...

On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
...
...
Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

I will send the patch. This is week i have a vacation, thus i am a bit slow.

-- Uladzislau Rezki

Hailong Liu

9:12 a.m.

On Thu, 15. Aug 22:07, Andrew Morton wrote:

...

On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
...
...
Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

-- Brs, hailong.

Uladzislau Rezki

10:13 a.m.

On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...

On Thu, 15. Aug 22:07, Andrew Morton wrote:

...
On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
...
...
Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

Thanks!

-- Uladzsislau Rezki

Hailong Liu

11:46 a.m.

On Fri, 16. Aug 12:13, Uladzislau Rezki wrote:

...

On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote:

...
On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
...
...
Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

Got it.

...

But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

...

Thanks!

-- Uladzsislau Rezki

-- help you, help me, Hailong.

Michal Hocko

12:32 p.m.

On Fri 16-08-24 19:46:26, Hailong Liu wrote:

...

On Fri, 16. Aug 12:13, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote:

...
On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
...
> Acked-by: Barry Song baohua@kernel.org > > because we already have a fallback here: > > void *__vmalloc_node_range_noprof : > > fail: > if (shift > PAGE_SHIFT) { > shift = PAGE_SHIFT; > align = real_align; > size = real_size; > goto again; > }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

Got it.

...
But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

Let me clarify what I would like to have clarified:

diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..fea90a39f5c5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3510,13 +3510,13 @@ void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot) EXPORT_SYMBOL_GPL(vmap_pfn); #endif /* CONFIG_VMAP_PFN */

+/* GFP_NOFAIL semantic is implemented by __vmalloc_node_range_noprof */ static inline unsigned int vm_area_alloc_pages(gfp_t gfp, int nid, unsigned int order, unsigned int nr_pages, struct page **pages) { unsigned int nr_allocated = 0; - gfp_t alloc_gfp = gfp; - bool nofail = gfp & __GFP_NOFAIL; + gfp_t alloc_gfp = gfp & ~ __GFP_NOFAIL; struct page *page; int i;

@@ -3527,9 +3527,6 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * more permissive. */ if (!order) { - /* bulk allocator doesn't support nofail req. officially */ - gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL; - while (nr_allocated < nr_pages) { unsigned int nr, nr_pages_request;

@@ -3547,12 +3544,12 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * but mempolicy wants to alloc memory by interleaving. */ if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE) - nr = alloc_pages_bulk_array_mempolicy_noprof(bulk_gfp, + nr = alloc_pages_bulk_array_mempolicy_noprof(alloc_gfp, nr_pages_request, pages + nr_allocated);

else - nr = alloc_pages_bulk_array_node_noprof(bulk_gfp, nid, + nr = alloc_pages_bulk_array_node_noprof(alloc_gfp, nid, nr_pages_request, pages + nr_allocated);

@@ -3566,13 +3563,6 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (nr != nr_pages_request) break; } - } else if (gfp & __GFP_NOFAIL) { - /* - * Higher order nofail allocations are really expensive and - * potentially dangerous (pre-mature OOM, disruptive reclaim - * and compaction etc. - */ - alloc_gfp &= ~__GFP_NOFAIL; }

/* High-order pages or fallback path if "bulk" fails. */

-- Michal Hocko SUSE Labs

Uladzislau Rezki

23 Aug 23 Aug

4:42 p.m.

Hello, Michal.

...

Let me clarify what I would like to have clarified:

diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..fea90a39f5c5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3510,13 +3510,13 @@ void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot) EXPORT_SYMBOL_GPL(vmap_pfn); #endif /* CONFIG_VMAP_PFN */ +/* GFP_NOFAIL semantic is implemented by __vmalloc_node_range_noprof */ static inline unsigned int vm_area_alloc_pages(gfp_t gfp, int nid, unsigned int order, unsigned int nr_pages, struct page **pages) { unsigned int nr_allocated = 0;

gfp_t alloc_gfp = gfp;

bool nofail = gfp & __GFP_NOFAIL;

gfp_t alloc_gfp = gfp & ~ __GFP_NOFAIL; struct page *page; int i;

@@ -3527,9 +3527,6 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * more permissive. */ if (!order) {
/* bulk allocator doesn't support nofail req. officially */
gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL;
while (nr_allocated < nr_pages) { unsigned int nr, nr_pages_request;
@@ -3547,12 +3544,12 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * but mempolicy wants to alloc memory by interleaving. */ if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE)
		nr = alloc_pages_bulk_array_mempolicy_noprof(bulk_gfp,
		nr = alloc_pages_bulk_array_mempolicy_noprof(alloc_gfp,
				nr_pages_request,
				pages + nr_allocated);
else
		nr = alloc_pages_bulk_array_node_noprof(bulk_gfp, nid,
		nr = alloc_pages_bulk_array_node_noprof(alloc_gfp, nid,
				nr_pages_request,
				pages + nr_allocated);
@@ -3566,13 +3563,6 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (nr != nr_pages_request) break; }
} else if (gfp & __GFP_NOFAIL) {
/*
 * Higher order nofail allocations are really expensive and
 * potentially dangerous (pre-mature OOM, disruptive reclaim
 * and compaction etc.
 */
alloc_gfp &= ~__GFP_NOFAIL;
}
/* High-order pages or fallback path if "bulk" fails. */ --

See below the change. It does not do any functional change and it is rather a small refactoring, which includes the comment i wanted to add and what you wanted to be clarified(if i got you correctly):

<snip> diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 3f9b6bd707d2..24fad2e48799 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3531,8 +3531,6 @@ vm_area_alloc_pages(gfp_t gfp, int nid, unsigned int order, unsigned int nr_pages, struct page **pages) { unsigned int nr_allocated = 0; - gfp_t alloc_gfp = gfp; - bool nofail = gfp & __GFP_NOFAIL; struct page *page; int i;

@@ -3543,9 +3541,6 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * more permissive. */ if (!order) { - /* bulk allocator doesn't support nofail req. officially */ - gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL; - while (nr_allocated < nr_pages) { unsigned int nr, nr_pages_request;

@@ -3563,12 +3558,12 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * but mempolicy wants to alloc memory by interleaving. */ if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE) - nr = alloc_pages_bulk_array_mempolicy_noprof(bulk_gfp, + nr = alloc_pages_bulk_array_mempolicy_noprof(gfp & ~__GFP_NOFAIL, nr_pages_request, pages + nr_allocated); - else - nr = alloc_pages_bulk_array_node_noprof(bulk_gfp, nid, + /* bulk allocator doesn't support nofail req. officially */ + nr = alloc_pages_bulk_array_node_noprof(gfp & ~__GFP_NOFAIL, nid, nr_pages_request, pages + nr_allocated);

@@ -3582,24 +3577,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (nr != nr_pages_request) break; } - } else if (gfp & __GFP_NOFAIL) { - /* - * Higher order nofail allocations are really expensive and - * potentially dangerous (pre-mature OOM, disruptive reclaim - * and compaction etc. - */ - alloc_gfp &= ~__GFP_NOFAIL; }

/* High-order pages or fallback path if "bulk" fails. */ while (nr_allocated < nr_pages) { - if (!nofail && fatal_signal_pending(current)) + if (!(gfp & __GFP_NOFAIL) && fatal_signal_pending(current)) break;

if (nid == NUMA_NO_NODE) - page = alloc_pages_noprof(alloc_gfp, order); + page = alloc_pages_noprof(gfp, order); else - page = alloc_pages_node_noprof(nid, alloc_gfp, order); + page = alloc_pages_node_noprof(nid, gfp, order); + if (unlikely(!page)) break;

@@ -3666,7 +3655,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, set_vm_area_page_order(area, page_shift - PAGE_SHIFT); page_order = vm_area_page_order(area);

- area->nr_pages = vm_area_alloc_pages(gfp_mask | __GFP_NOWARN, + /* + * Higher order nofail allocations are really expensive and + * potentially dangerous (pre-mature OOM, disruptive reclaim + * and compaction etc. + * + * Please note, the __vmalloc_node_range_noprof() falls-back + * to order-0 pages if high-order attempt has been unsuccessful. + */ + area->nr_pages = vm_area_alloc_pages(page_order ? + gfp_mask &= ~__GFP_NOFAIL : gfp_mask | __GFP_NOWARN, node, page_order, nr_small_pages, area->pages);

atomic_long_add(area->nr_pages, &nr_vmalloc_pages); <snip>

Is that aligned with your wish?

Thanks!

-- Uladzislau Rezki

Michal Hocko

26 Aug 26 Aug

7:52 a.m.

On Fri 23-08-24 18:42:47, Uladzislau Rezki wrote: [...]

...

@@ -3666,7 +3655,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, set_vm_area_page_order(area, page_shift - PAGE_SHIFT); page_order = vm_area_page_order(area);

area->nr_pages = vm_area_alloc_pages(gfp_mask | __GFP_NOWARN,
/*
* Higher order nofail allocations are really expensive and
* potentially dangerous (pre-mature OOM, disruptive reclaim
* and compaction etc.
*
* Please note, the __vmalloc_node_range_noprof() falls-back
* to order-0 pages if high-order attempt has been unsuccessful.
*/
area->nr_pages = vm_area_alloc_pages(page_order ?
gfp_mask &= ~__GFP_NOFAIL : gfp_mask | __GFP_NOWARN,
node, page_order, nr_small_pages, area->pages);
atomic_long_add(area->nr_pages, &nr_vmalloc_pages);

<snip>

Is that aligned with your wish?

I am not a great fan of modifying gfp_mask inside the ternary operator like that. It makes the code harder to read. Is there any actual reason to simply drop GFP_NOFAIL unconditionally and rely do the NOFAIL handling for all orders at the same place?

Not that I care about this much TBH. It is an improvement to drop all the NOFAIL specifics from vm_area_alloc_pages.

-- Michal Hocko SUSE Labs

Uladzislau Rezki

12:38 p.m.

On Mon, Aug 26, 2024 at 09:52:42AM +0200, Michal Hocko wrote:

...

On Fri 23-08-24 18:42:47, Uladzislau Rezki wrote: [...]

...
@@ -3666,7 +3655,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, set_vm_area_page_order(area, page_shift - PAGE_SHIFT); page_order = vm_area_page_order(area);

area->nr_pages = vm_area_alloc_pages(gfp_mask | __GFP_NOWARN,
/*
* Higher order nofail allocations are really expensive and
* potentially dangerous (pre-mature OOM, disruptive reclaim
* and compaction etc.
*
* Please note, the __vmalloc_node_range_noprof() falls-back
* to order-0 pages if high-order attempt has been unsuccessful.
*/
area->nr_pages = vm_area_alloc_pages(page_order ?
gfp_mask &= ~__GFP_NOFAIL : gfp_mask | __GFP_NOWARN,
node, page_order, nr_small_pages, area->pages);
atomic_long_add(area->nr_pages, &nr_vmalloc_pages);

<snip>

Is that aligned with your wish?
I am not a great fan of modifying gfp_mask inside the ternary operator like that. It makes the code harder to read. Is there any actual reason to simply drop GFP_NOFAIL unconditionally and rely do the NOFAIL handling for all orders at the same place?

1. So, for bulk we have below:

/* gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL; */

I am not sure if we need it but it says it does not support it which is not clear for me why we have to drop __GFP_NOFAIL for bulk(). There is a fallback to a single page allocator. If passing __GFP_NOFAIL does not trigger any warning or panic a system, then i do not follow why we drop that flag.

Is that odd?

2. High-order allocations. Do you think we should not care much about it when __GFP_NOFAIL is set? Same here, there is a fallback for order-0 if "high" fails, it is more likely NO_FAIL succeed for order-0. Thus keeping NOFAIL for high-order sounds like not a good approach to me.

3. "... at the same place?" Do you mean in the __vmalloc_node_range_noprof()?

__vmalloc_node_range_noprof() -> __vmalloc_area_node(gfp_mask) -> vm_area_alloc_pages()

if, so it is not straight forward, i.e. there is one more allocation:

<snip> static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot, unsigned int page_shift, int node) { ... /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) { area->pages = __vmalloc_node_noprof(array_size, 1, nested_gfp, node, area->caller); } else { area->pages = kmalloc_node_noprof(array_size, nested_gfp, node); } ... } <snip>

whereas it is easier to do it inside of the __vmalloc_area_node().

...

Not that I care about this much TBH. It is an improvement to drop all the NOFAIL specifics from vm_area_alloc_pages.

I agree. I also do not like modifying gfp flags on different levels and different cases. To me there is only one case. It is high-order requests with NOFAIL. For this i think we should keep our approach, i mean dropping NOFAIL and repeat because we have a fallback.

-- Uladzislau Rezki

Michal Hocko

27 Aug 27 Aug

6:49 a.m.

On Mon 26-08-24 14:38:40, Uladzislau Rezki wrote:

...

On Mon, Aug 26, 2024 at 09:52:42AM +0200, Michal Hocko wrote:

...
On Fri 23-08-24 18:42:47, Uladzislau Rezki wrote: [...]

...
@@ -3666,7 +3655,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, set_vm_area_page_order(area, page_shift - PAGE_SHIFT); page_order = vm_area_page_order(area);

area->nr_pages = vm_area_alloc_pages(gfp_mask | __GFP_NOWARN,
/*
* Higher order nofail allocations are really expensive and
* potentially dangerous (pre-mature OOM, disruptive reclaim
* and compaction etc.
*
* Please note, the __vmalloc_node_range_noprof() falls-back
* to order-0 pages if high-order attempt has been unsuccessful.
*/
area->nr_pages = vm_area_alloc_pages(page_order ?
gfp_mask &= ~__GFP_NOFAIL : gfp_mask | __GFP_NOWARN,
node, page_order, nr_small_pages, area->pages);
atomic_long_add(area->nr_pages, &nr_vmalloc_pages);

<snip>

Is that aligned with your wish?
I am not a great fan of modifying gfp_mask inside the ternary operator like that. It makes the code harder to read. Is there any actual reason to simply drop GFP_NOFAIL unconditionally and rely do the NOFAIL handling for all orders at the same place?
So, for bulk we have below:

/* gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL; */

I am not sure if we need it but it says it does not support it which is not clear for me why we have to drop __GFP_NOFAIL for bulk(). There is a fallback to a single page allocator. If passing __GFP_NOFAIL does not trigger any warning or panic a system, then i do not follow why we drop that flag.

Is that odd?

I suspect this was a pre-caution more than anything.

...

High-order allocations. Do you think we should not care much about

it when __GFP_NOFAIL is set? Same here, there is a fallback for order-0 if "high" fails, it is more likely NO_FAIL succeed for order-0. Thus keeping NOFAIL for high-order sounds like not a good approach to me.

We should avoid high order allocations with GFP_NOFAIL at all cost.

...

"... at the same place?"

Do you mean in the __vmalloc_node_range_noprof()?

__vmalloc_node_range_noprof() -> __vmalloc_area_node(gfp_mask) -> vm_area_alloc_pages()

if, so it is not straight forward, i.e. there is one more allocation:

<snip> static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot, unsigned int page_shift, int node) { ... /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) { area->pages = __vmalloc_node_noprof(array_size, 1, nested_gfp, node, area->caller); } else { area->pages = kmalloc_node_noprof(array_size, nested_gfp, node); } ... } <snip>

whereas it is easier to do it inside of the __vmalloc_area_node().

Right. The allocation path is quite convoluted here. If it is just too much of a hassle to implement NOFAIL at a single place then we should aim at reducing that. Having that at 3 different layers is just begging for inconsistences.

-- Michal Hocko SUSE Labs

Uladzislau Rezki

12:47 p.m.

On Tue, Aug 27, 2024 at 08:49:35AM +0200, Michal Hocko wrote:

...

On Mon 26-08-24 14:38:40, Uladzislau Rezki wrote:

...
On Mon, Aug 26, 2024 at 09:52:42AM +0200, Michal Hocko wrote:

...
On Fri 23-08-24 18:42:47, Uladzislau Rezki wrote: [...]

...
@@ -3666,7 +3655,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, set_vm_area_page_order(area, page_shift - PAGE_SHIFT); page_order = vm_area_page_order(area);

area->nr_pages = vm_area_alloc_pages(gfp_mask | __GFP_NOWARN,
/*
* Higher order nofail allocations are really expensive and
* potentially dangerous (pre-mature OOM, disruptive reclaim
* and compaction etc.
*
* Please note, the __vmalloc_node_range_noprof() falls-back
* to order-0 pages if high-order attempt has been unsuccessful.
*/
area->nr_pages = vm_area_alloc_pages(page_order ?
gfp_mask &= ~__GFP_NOFAIL : gfp_mask | __GFP_NOWARN,
node, page_order, nr_small_pages, area->pages);
atomic_long_add(area->nr_pages, &nr_vmalloc_pages);

<snip>

Is that aligned with your wish?
I am not a great fan of modifying gfp_mask inside the ternary operator like that. It makes the code harder to read. Is there any actual reason to simply drop GFP_NOFAIL unconditionally and rely do the NOFAIL handling for all orders at the same place?
So, for bulk we have below:

/* gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL; */

I am not sure if we need it but it says it does not support it which is not clear for me why we have to drop __GFP_NOFAIL for bulk(). There is a fallback to a single page allocator. If passing __GFP_NOFAIL does not trigger any warning or panic a system, then i do not follow why we drop that flag.

Is that odd?
I suspect this was a pre-caution more than anything.

OK, then i drop it.

...

...

High-order allocations. Do you think we should not care much about

it when __GFP_NOFAIL is set? Same here, there is a fallback for order-0 if "high" fails, it is more likely NO_FAIL succeed for order-0. Thus keeping NOFAIL for high-order sounds like not a good approach to me.

We should avoid high order allocations with GFP_NOFAIL at all cost.

What do you propose here? Fail such request?

...

...

"... at the same place?"

Do you mean in the __vmalloc_node_range_noprof()?

__vmalloc_node_range_noprof() -> __vmalloc_area_node(gfp_mask) -> vm_area_alloc_pages()

if, so it is not straight forward, i.e. there is one more allocation:

<snip> static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot, unsigned int page_shift, int node) { ... /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) { area->pages = __vmalloc_node_noprof(array_size, 1, nested_gfp, node, area->caller); } else { area->pages = kmalloc_node_noprof(array_size, nested_gfp, node); } ... } <snip>

whereas it is easier to do it inside of the __vmalloc_area_node().

Right. The allocation path is quite convoluted here. If it is just too much of a hassle to implement NOFAIL at a single place then we should aim at reducing that. Having that at 3 different layers is just begging for inconsistences.

Hard to not agree :)

-- Uladzislau Rezki

Michal Hocko

1:37 p.m.

On Tue 27-08-24 14:47:30, Uladzislau Rezki wrote:

...

On Tue, Aug 27, 2024 at 08:49:35AM +0200, Michal Hocko wrote:

[...]

...

...
...

High-order allocations. Do you think we should not care much about

it when __GFP_NOFAIL is set? Same here, there is a fallback for order-0 if "high" fails, it is more likely NO_FAIL succeed for order-0. Thus keeping NOFAIL for high-order sounds like not a good approach to me.

We should avoid high order allocations with GFP_NOFAIL at all cost.

What do you propose here? Fail such request?

We shouldn't have any hard requirements for higher order allocations in the vmalloc right? In other words we can always fallback to base pages.

-- Michal Hocko SUSE Labs

Uladzislau Rezki

3:29 p.m.

On Tue, Aug 27, 2024 at 03:37:38PM +0200, Michal Hocko wrote:

...

On Tue 27-08-24 14:47:30, Uladzislau Rezki wrote:

...
On Tue, Aug 26, 2024 at 08:49:35AM +0200, Michal Hocko wrote:

[...]

...
...
...

High-order allocations. Do you think we should not care much about

it when __GFP_NOFAIL is set? Same here, there is a fallback for order-0 if "high" fails, it is more likely NO_FAIL succeed for order-0. Thus keeping NOFAIL for high-order sounds like not a good approach to me.

We should avoid high order allocations with GFP_NOFAIL at all cost.

What do you propose here? Fail such request?

We shouldn't have any hard requirements for higher order allocations in the vmalloc right? In other words we can always fallback to base pages.

We always drop NOFAIL for high-order, if it fails we fall-back to order-0. I got the feeling that you wanted just bail-out fully if high-order and NOFAIL.

-- Uladzislau Rezki

Michal Hocko

28 Aug 28 Aug

7:14 a.m.

On Tue 27-08-24 17:29:34, Uladzislau Rezki wrote:

...

On Tue, Aug 27, 2024 at 03:37:38PM +0200, Michal Hocko wrote:

...
On Tue 27-08-24 14:47:30, Uladzislau Rezki wrote:

...
On Tue, Aug 26, 2024 at 08:49:35AM +0200, Michal Hocko wrote:

[...]

...
...
...

High-order allocations. Do you think we should not care much about

it when __GFP_NOFAIL is set? Same here, there is a fallback for order-0 if "high" fails, it is more likely NO_FAIL succeed for order-0. Thus keeping NOFAIL for high-order sounds like not a good approach to me.

We should avoid high order allocations with GFP_NOFAIL at all cost.

What do you propose here? Fail such request?

We shouldn't have any hard requirements for higher order allocations in the vmalloc right? In other words we can always fallback to base pages.

We always drop NOFAIL for high-order, if it fails we fall-back to order-0. I got the feeling that you wanted just bail-out fully if high-order and NOFAIL.

Nope. We should always fall back to order 0 for both NOFAIL and regular vmalloc allocations.

-- Michal Hocko SUSE Labs

Uladzislau Rezki

5:23 p.m.

On Wed, Aug 28, 2024 at 09:14:53AM +0200, Michal Hocko wrote:

...

On Tue 27-08-24 17:29:34, Uladzislau Rezki wrote:

...
On Tue, Aug 27, 2024 at 03:37:38PM +0200, Michal Hocko wrote:

...
On Tue 27-08-24 14:47:30, Uladzislau Rezki wrote:

...
On Tue, Aug 26, 2024 at 08:49:35AM +0200, Michal Hocko wrote:

[...]

...
...
...

High-order allocations. Do you think we should not care much about

it when __GFP_NOFAIL is set? Same here, there is a fallback for order-0 if "high" fails, it is more likely NO_FAIL succeed for order-0. Thus keeping NOFAIL for high-order sounds like not a good approach to me.

We should avoid high order allocations with GFP_NOFAIL at all cost.

What do you propose here? Fail such request?

We shouldn't have any hard requirements for higher order allocations in the vmalloc right? In other words we can always fallback to base pages.

We always drop NOFAIL for high-order, if it fails we fall-back to order-0. I got the feeling that you wanted just bail-out fully if high-order and NOFAIL.

Nope. We should always fall back to order 0 for both NOFAIL and regular vmalloc allocations.

Good.

Thanks for the ACK!

-- Uladzislau Rezki

Uladzislau Rezki

19 Aug 19 Aug

11:59 a.m.

On Fri, Aug 16, 2024 at 07:46:26PM +0800, Hailong Liu wrote:

...

On Fri, 16. Aug 12:13, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote:

...
On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
...
> Acked-by: Barry Song baohua@kernel.org > > because we already have a fallback here: > > void *__vmalloc_node_range_noprof : > > fail: > if (shift > PAGE_SHIFT) { > shift = PAGE_SHIFT; > align = real_align; > size = real_size; > goto again; > }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

Got it.

...
But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

Agree. As for comment, i meant to comment the below fallback:

<snip> fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; } <snip>

-- Uladzislau Rezki

Hailong Liu

12:57 p.m.

On Mon, 19. Aug 13:59, Uladzislau Rezki wrote:

...

On Fri, Aug 16, 2024 at 07:46:26PM +0800, Hailong Liu wrote:

...
On Fri, 16. Aug 12:13, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote:

...
On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
> > Acked-by: Barry Song baohua@kernel.org > > > > because we already have a fallback here: > > > > void *__vmalloc_node_range_noprof : > > > > fail: > > if (shift > PAGE_SHIFT) { > > shift = PAGE_SHIFT; > > align = real_align; > > size = real_size; > > goto again; > > } > > This really deserves a comment because this is not really clear at all. > The code is also fragile and it would benefit from some re-org. > > Thanks for the fix. > > Acked-by: Michal Hocko mhocko@suse.com > I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

Got it.

...
But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

Agree. As for comment, i meant to comment the below fallback:

Michal send a craft that make nofail logic more clearer and I check the branch found Andrew already merged in -stable branch. So we can include these with a new patch.

...

<snip> fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; } <snip>

-- Uladzislau Rezki

Help you, Help me, Hailong.

Uladzislau Rezki

1:38 p.m.

On Mon, Aug 19, 2024 at 08:57:38PM +0800, Hailong Liu wrote:

...

On Mon, 19. Aug 13:59, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 07:46:26PM +0800, Hailong Liu wrote:

...
On Fri, 16. Aug 12:13, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote:

...
On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

> > > Acked-by: Barry Song baohua@kernel.org > > > > > > because we already have a fallback here: > > > > > > void *__vmalloc_node_range_noprof : > > > > > > fail: > > > if (shift > PAGE_SHIFT) { > > > shift = PAGE_SHIFT; > > > align = real_align; > > > size = real_size; > > > goto again; > > > } > > > > This really deserves a comment because this is not really clear at all. > > The code is also fragile and it would benefit from some re-org. > > > > Thanks for the fix. > > > > Acked-by: Michal Hocko mhocko@suse.com > > > I agree. This is only clear for people who know the code. A "fallback" > to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

Got it.

...
But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

Agree. As for comment, i meant to comment the below fallback:

Michal send a craft that make nofail logic more clearer and I check the branch found Andrew already merged in -stable branch. So we can include these with a new patch.

Just to confirm. Will you send an extra patch with the comment?

-- Uladzislau Rezki

Uladzislau Rezki

1:45 p.m.

On Mon, Aug 19, 2024 at 03:38:51PM +0200, Uladzislau Rezki wrote:

...

On Mon, Aug 19, 2024 at 08:57:38PM +0800, Hailong Liu wrote:

...
On Mon, 19. Aug 13:59, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 07:46:26PM +0800, Hailong Liu wrote:

...
On Fri, 16. Aug 12:13, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote: > On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote: > > > > > Acked-by: Barry Song baohua@kernel.org > > > > > > > > because we already have a fallback here: > > > > > > > > void *__vmalloc_node_range_noprof : > > > > > > > > fail: > > > > if (shift > PAGE_SHIFT) { > > > > shift = PAGE_SHIFT; > > > > align = real_align; > > > > size = real_size; > > > > goto again; > > > > } > > > > > > This really deserves a comment because this is not really clear at all. > > > The code is also fragile and it would benefit from some re-org. > > > > > > Thanks for the fix. > > > > > > Acked-by: Michal Hocko mhocko@suse.com > > > > > I agree. This is only clear for people who know the code. A "fallback" > > to order-0 should be commented. > > It's been a week. Could someone please propose a fixup patch to add > this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

Got it.

...
But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

Agree. As for comment, i meant to comment the below fallback:

Michal send a craft that make nofail logic more clearer and I check the branch found Andrew already merged in -stable branch. So we can include these with a new patch.

Just to confirm. Will you send an extra patch with the comment?

Also, an idea to handle NOFAIL outside of vm_area_alloc_pages() looks sounds good to me.

-- Uladzislau Rezki

Hailong Liu

20 Aug 20 Aug

1:59 a.m.

On Mon, 19. Aug 15:38, Uladzislau Rezki wrote:

...

On Mon, Aug 19, 2024 at 08:57:38PM +0800, Hailong Liu wrote:

...
On Mon, 19. Aug 13:59, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 07:46:26PM +0800, Hailong Liu wrote:

...
On Fri, 16. Aug 12:13, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote: > On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote: > > > > > Acked-by: Barry Song baohua@kernel.org > > > > > > > > because we already have a fallback here: > > > > > > > > void *__vmalloc_node_range_noprof : > > > > > > > > fail: > > > > if (shift > PAGE_SHIFT) { > > > > shift = PAGE_SHIFT; > > > > align = real_align; > > > > size = real_size; > > > > goto again; > > > > } > > > > > > This really deserves a comment because this is not really clear at all. > > > The code is also fragile and it would benefit from some re-org. > > > > > > Thanks for the fix. > > > > > > Acked-by: Michal Hocko mhocko@suse.com > > > > > I agree. This is only clear for people who know the code. A "fallback" > > to order-0 should be commented. > > It's been a week. Could someone please propose a fixup patch to add > this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

Got it.

...
But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

Agree. As for comment, i meant to comment the below fallback:

Michal send a craft that make nofail logic more clearer and I check the branch found Andrew already merged in -stable branch. So we can include these with a new patch.

Just to confirm. Will you send an extra patch with the comment?

If this is not urgent, I can send this patch later this week. :)

...

-- Uladzislau Rezki

Help you, Help me, Hailong.

Uladzislau Rezki

6:44 a.m.

On Tue, Aug 20, 2024 at 09:59:50AM +0800, Hailong Liu wrote:

...

On Mon, 19. Aug 15:38, Uladzislau Rezki wrote:

...
On Mon, Aug 19, 2024 at 08:57:38PM +0800, Hailong Liu wrote:

...
On Mon, 19. Aug 13:59, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 07:46:26PM +0800, Hailong Liu wrote:

...
On Fri, 16. Aug 12:13, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote: > On Thu, 15. Aug 22:07, Andrew Morton wrote: > > On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote: > > > > > > > Acked-by: Barry Song baohua@kernel.org > > > > > > > > > > because we already have a fallback here: > > > > > > > > > > void *__vmalloc_node_range_noprof : > > > > > > > > > > fail: > > > > > if (shift > PAGE_SHIFT) { > > > > > shift = PAGE_SHIFT; > > > > > align = real_align; > > > > > size = real_size; > > > > > goto again; > > > > > } > > > > > > > > This really deserves a comment because this is not really clear at all. > > > > The code is also fragile and it would benefit from some re-org. > > > > > > > > Thanks for the fix. > > > > > > > > Acked-by: Michal Hocko mhocko@suse.com > > > > > > > I agree. This is only clear for people who know the code. A "fallback" > > > to order-0 should be commented. > > > > It's been a week. Could someone please propose a fixup patch to add > > this comment? > > Hi Andrew: > > Do you mean that I need to send a v2 patch with the the comments included? > It is better to post v2.

Got it.

...
But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

Am i missing something obvious?

For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

Agree. As for comment, i meant to comment the below fallback:

Michal send a craft that make nofail logic more clearer and I check the branch found Andrew already merged in -stable branch. So we can include these with a new patch.

Just to confirm. Will you send an extra patch with the comment?

If this is not urgent, I can send this patch later this week. :)

This is for synchronization, so we both do not do a double work :)

-- Uladzislau Rezki

Hailong Liu

6:54 a.m.

On Tue, 20. Aug 08:44, Uladzislau Rezki wrote:

...

On Tue, Aug 20, 2024 at 09:59:50AM +0800, Hailong Liu wrote:

...
On Mon, 19. Aug 15:38, Uladzislau Rezki wrote:

...
On Mon, Aug 19, 2024 at 08:57:38PM +0800, Hailong Liu wrote:

...
On Mon, 19. Aug 13:59, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 07:46:26PM +0800, Hailong Liu wrote:

...
On Fri, 16. Aug 12:13, Uladzislau Rezki wrote: > On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote: > > On Thu, 15. Aug 22:07, Andrew Morton wrote: > > > On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote: > > > > > > > > > Acked-by: Barry Song baohua@kernel.org > > > > > > > > > > > > because we already have a fallback here: > > > > > > > > > > > > void *__vmalloc_node_range_noprof : > > > > > > > > > > > > fail: > > > > > > if (shift > PAGE_SHIFT) { > > > > > > shift = PAGE_SHIFT; > > > > > > align = real_align; > > > > > > size = real_size; > > > > > > goto again; > > > > > > } > > > > > > > > > > This really deserves a comment because this is not really clear at all. > > > > > The code is also fragile and it would benefit from some re-org. > > > > > > > > > > Thanks for the fix. > > > > > > > > > > Acked-by: Michal Hocko mhocko@suse.com > > > > > > > > > I agree. This is only clear for people who know the code. A "fallback" > > > > to order-0 should be commented. > > > > > > It's been a week. Could someone please propose a fixup patch to add > > > this comment? > > > > Hi Andrew: > > > > Do you mean that I need to send a v2 patch with the the comments included? > > > It is better to post v2. Got it.

> > But before, could you please comment on: > > in case of order-0, bulk path may easily fail and fallback to the single > page allocator. If an request is marked as NO_FAIL, i am talking about > order-0 request, your change breaks GFP_NOFAIL for !order. > > Am i missing something obvious? For order-0, alloc_pages(GFP_X | __GFP_NOFAIL, 0), buddy allocator will handle the flag correctly. IMO we don't need to handle the flag here.

Agree. As for comment, i meant to comment the below fallback:

Michal send a craft that make nofail logic more clearer and I check the branch found Andrew already merged in -stable branch. So we can include these with a new patch.

Just to confirm. Will you send an extra patch with the comment?

If this is not urgent, I can send this patch later this week. :)

This is for synchronization, so we both do not do a double work :)

Ahh I guess you already have a plan, so I don't need to do this. Wating for your patch :)

...

-- Uladzislau Rezki

Help you, Help me, Hailong.

Baoquan He

16 Aug 16 Aug

4:11 p.m.

On 08/16/24 at 12:13pm, Uladzislau Rezki wrote:

...

On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote:

...
On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
...
...
Acked-by: Barry Song baohua@kernel.org

because we already have a fallback here:

void *__vmalloc_node_range_noprof :

fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

In case order-0, bulk_gfp masks off __GFP_NOFAIL, but alloc_gfp doesn't. So alloc_gfp has __GFP_NOFAIL in fallback, it won't be failed by alloc_pages().

Baoquan He

4:15 p.m.

On 08/17/24 at 12:11am, Baoquan He wrote:

...

On 08/16/24 at 12:13pm, Uladzislau Rezki wrote:

...
On Fri, Aug 16, 2024 at 05:12:32PM +0800, Hailong Liu wrote:

...
On Thu, 15. Aug 22:07, Andrew Morton wrote:

...
On Fri, 9 Aug 2024 11:41:42 +0200 Uladzislau Rezki urezki@gmail.com wrote:

...
...
> Acked-by: Barry Song baohua@kernel.org > > because we already have a fallback here: > > void *__vmalloc_node_range_noprof : > > fail: > if (shift > PAGE_SHIFT) { > shift = PAGE_SHIFT; > align = real_align; > size = real_size; > goto again; > }

This really deserves a comment because this is not really clear at all. The code is also fragile and it would benefit from some re-org.

Thanks for the fix.

Acked-by: Michal Hocko mhocko@suse.com

I agree. This is only clear for people who know the code. A "fallback" to order-0 should be commented.

It's been a week. Could someone please propose a fixup patch to add this comment?

Hi Andrew:

Do you mean that I need to send a v2 patch with the the comments included?

It is better to post v2.

But before, could you please comment on:

in case of order-0, bulk path may easily fail and fallback to the single page allocator. If an request is marked as NO_FAIL, i am talking about order-0 request, your change breaks GFP_NOFAIL for !order.

In case order-0, bulk_gfp masks off __GFP_NOFAIL, but alloc_gfp doesn't. So alloc_gfp has __GFP_NOFAIL in fallback, it won't be failed by alloc_pages().

Please ignore this, I didn't update my local mail box, didn't see Hailong's reply.

319

days inactive

339

days old

linux-stable-mirror@lists.linaro.org

29 comments

participants

tags (0)

participants (6)

Andrew Morton
Baoquan He
Barry Song
Hailong Liu
Michal Hocko
Uladzislau Rezki