The patch titled Subject: mm/page_alloc: keep track of free highatomic has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-page_alloc-keep-track-of-free-highatomic.patch
This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches...
This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days
------------------------------------------------------ From: Yu Zhao yuzhao@google.com Subject: mm/page_alloc: keep track of free highatomic Date: Mon, 28 Oct 2024 12:26:53 -0600
OOM kills due to vastly overestimated free highatomic reserves were observed:
... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ... Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ... Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
The second line above shows that the OOM kill was due to the following condition:
free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
And the third line shows there were no free pages in any MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. Therefore __zone_watermark_unusable_free() underestimated the usable free memory by over 1GB, which resulted in the unnecessary OOM kill above.
The comments in __zone_watermark_unusable_free() warns about the potential risk, i.e.,
If the caller does not have rights to reserves below the min watermark then subtract the high-atomic reserves. This will over-estimate the size of the atomic reserve but it avoids a search.
However, it is possible to keep track of free pages in reserved highatomic pageblocks with a new per-zone counter nr_free_highatomic protected by the zone lock, to avoid a search when calculating the usable free memory. And the cost would be minimal, i.e., simple arithmetics in the highatomic alloc/free/move paths.
Note that since nr_free_highatomic can be relatively small, using a per-cpu counter might cause too much drift and defeat its purpose, in addition to the extra memory overhead.
Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com Signed-off-by: Yu Zhao yuzhao@google.com Reported-by: Link Lin linkl@google.com Acked-by: David Rientjes rientjes@google.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org [6.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
include/linux/mmzone.h | 1 + mm/page_alloc.c | 10 +++++++--- 2 files changed, 8 insertions(+), 3 deletions(-)
--- a/include/linux/mmzone.h~mm-page_alloc-keep-track-of-free-highatomic +++ a/include/linux/mmzone.h @@ -823,6 +823,7 @@ struct zone { unsigned long watermark_boost;
unsigned long nr_reserved_highatomic; + unsigned long nr_free_highatomic;
/* * We don't know if the memory that we're going to allocate will be --- a/mm/page_alloc.c~mm-page_alloc-keep-track-of-free-highatomic +++ a/mm/page_alloc.c @@ -635,6 +635,8 @@ compaction_capture(struct capture_contro static inline void account_freepages(struct zone *zone, int nr_pages, int migratetype) { + lockdep_assert_held(&zone->lock); + if (is_migrate_isolate(migratetype)) return;
@@ -642,6 +644,9 @@ static inline void account_freepages(str
if (is_migrate_cma(migratetype)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); + + if (is_migrate_highatomic(migratetype)) + WRITE_ONCE(zone->nr_free_highatomic, zone->nr_free_highatomic + nr_pages); }
/* Used for pages not on another list */ @@ -3081,11 +3086,10 @@ static inline long __zone_watermark_unus
/* * If the caller does not have rights to reserves below the min - * watermark then subtract the high-atomic reserves. This will - * over-estimate the size of the atomic reserve but it avoids a search. + * watermark then subtract the free pages reserved for highatomic. */ if (likely(!(alloc_flags & ALLOC_RESERVES))) - unusable_free += z->nr_reserved_highatomic; + unusable_free += READ_ONCE(z->nr_free_highatomic);
#ifdef CONFIG_CMA /* If allocation can't use CMA areas don't use free CMA pages */ _
Patches currently in -mm which might be from yuzhao@google.com are
mm-allow-set-clear-page_type-again.patch mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch mm-multi-gen-lru-use-pteppmdp_clear_young_notify.patch mm-page_alloc-keep-track-of-free-highatomic.patch
On 10/29/24 01:58, Andrew Morton wrote:
The patch titled Subject: mm/page_alloc: keep track of free highatomic has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-page_alloc-keep-track-of-free-highatomic.patch
In case of mm-hotfixes, the cc stable for 6.12 is unnecessary as it should make it there directly. Perhaps it's the best way, yeah.
Thanks, Vlastimil
This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches...
This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days
From: Yu Zhao yuzhao@google.com Subject: mm/page_alloc: keep track of free highatomic Date: Mon, 28 Oct 2024 12:26:53 -0600
OOM kills due to vastly overestimated free highatomic reserves were observed:
... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ... Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ... Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
The second line above shows that the OOM kill was due to the following condition:
free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
And the third line shows there were no free pages in any MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. Therefore __zone_watermark_unusable_free() underestimated the usable free memory by over 1GB, which resulted in the unnecessary OOM kill above.
The comments in __zone_watermark_unusable_free() warns about the potential risk, i.e.,
If the caller does not have rights to reserves below the min watermark then subtract the high-atomic reserves. This will over-estimate the size of the atomic reserve but it avoids a search.
However, it is possible to keep track of free pages in reserved highatomic pageblocks with a new per-zone counter nr_free_highatomic protected by the zone lock, to avoid a search when calculating the usable free memory. And the cost would be minimal, i.e., simple arithmetics in the highatomic alloc/free/move paths.
Note that since nr_free_highatomic can be relatively small, using a per-cpu counter might cause too much drift and defeat its purpose, in addition to the extra memory overhead.
Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com Signed-off-by: Yu Zhao yuzhao@google.com Reported-by: Link Lin linkl@google.com Acked-by: David Rientjes rientjes@google.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org [6.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org
include/linux/mmzone.h | 1 + mm/page_alloc.c | 10 +++++++--- 2 files changed, 8 insertions(+), 3 deletions(-)
--- a/include/linux/mmzone.h~mm-page_alloc-keep-track-of-free-highatomic +++ a/include/linux/mmzone.h @@ -823,6 +823,7 @@ struct zone { unsigned long watermark_boost; unsigned long nr_reserved_highatomic;
- unsigned long nr_free_highatomic;
/* * We don't know if the memory that we're going to allocate will be --- a/mm/page_alloc.c~mm-page_alloc-keep-track-of-free-highatomic +++ a/mm/page_alloc.c @@ -635,6 +635,8 @@ compaction_capture(struct capture_contro static inline void account_freepages(struct zone *zone, int nr_pages, int migratetype) {
- lockdep_assert_held(&zone->lock);
- if (is_migrate_isolate(migratetype)) return;
@@ -642,6 +644,9 @@ static inline void account_freepages(str if (is_migrate_cma(migratetype)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
- if (is_migrate_highatomic(migratetype))
WRITE_ONCE(zone->nr_free_highatomic, zone->nr_free_highatomic + nr_pages);
} /* Used for pages not on another list */ @@ -3081,11 +3086,10 @@ static inline long __zone_watermark_unus /* * If the caller does not have rights to reserves below the min
* watermark then subtract the high-atomic reserves. This will
* over-estimate the size of the atomic reserve but it avoids a search.
*/ if (likely(!(alloc_flags & ALLOC_RESERVES)))* watermark then subtract the free pages reserved for highatomic.
unusable_free += z->nr_reserved_highatomic;
unusable_free += READ_ONCE(z->nr_free_highatomic);
#ifdef CONFIG_CMA /* If allocation can't use CMA areas don't use free CMA pages */ _
Patches currently in -mm which might be from yuzhao@google.com are
mm-allow-set-clear-page_type-again.patch mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch mm-multi-gen-lru-use-pteppmdp_clear_young_notify.patch mm-page_alloc-keep-track-of-free-highatomic.patch
On Tue, 29 Oct 2024 10:05:43 +0100 Vlastimil Babka vbabka@suse.cz wrote:
On 10/29/24 01:58, Andrew Morton wrote:
The patch titled Subject: mm/page_alloc: keep track of free highatomic has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-page_alloc-keep-track-of-free-highatomic.patch
In case of mm-hotfixes, the cc stable for 6.12 is unnecessary as it should make it there directly. Perhaps it's the best way, yeah.
Oh, yeah, right. There's no Fixes:. How do we know it's 6.12+?
On 10/29/24 20:30, Andrew Morton wrote:
On Tue, 29 Oct 2024 10:05:43 +0100 Vlastimil Babka vbabka@suse.cz wrote:
On 10/29/24 01:58, Andrew Morton wrote:
The patch titled Subject: mm/page_alloc: keep track of free highatomic has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-page_alloc-keep-track-of-free-highatomic.patch
In case of mm-hotfixes, the cc stable for 6.12 is unnecessary as it should make it there directly. Perhaps it's the best way, yeah.
Oh, yeah, right. There's no Fixes:. How do we know it's 6.12+?
Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") Depends-on: e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting")
We've discussed [1] how this is not a good stable material due to the dependency from 6.10 and the problem is old and nobody has reported it yet, and also doesn't meet the stable kernel rules. But targetting 6.12 (which currently happens via the hotfixes branch) would make sense because it's most likely the next LTS and already has the dependency.
So with the tags above we're not proposing it for stable backports ourselves, but anyone who considers to backport it should have sufficient information.
[1] https://lore.kernel.org/all/afd9f99f-f49a-41c7-b987-9e59a9d296ad@suse.cz/
linux-stable-mirror@lists.linaro.org