In the effort to reduce zombie memcgs [1], it was discovered that the memcg LRU doesn't apply enough pressure on offlined memcgs. Specifically, instead of rotating them to the tail of the current generation (MEMCG_LRU_TAIL) for a second attempt, it moves them to the next generation (MEMCG_LRU_YOUNG) after the first attempt.
Not applying enough pressure on offlined memcgs can cause them to build up, and this can be particularly harmful to memory-constrained systems.
On Pixel 8 Pro, launching apps for 50 cycles: Before After Change Zombie memcgs 45 35 -22%
[1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@...
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao yuzhao@google.com Reported-by: T.J. Mercier tjmercier@google.com Tested-by: T.J. Mercier tjmercier@google.com Cc: stable@vger.kernel.org --- include/linux/mmzone.h | 8 ++++---- mm/vmscan.c | 24 ++++++++++++++++-------- 2 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e3093ef9530f..2efd3be484fd 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -524,10 +524,10 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; * 2. The first attempt to reclaim a memcg below low, which triggers * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim a memcg below reclaimable size threshold, - * which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim a memcg below reclaimable size threshold, - * which triggers MEMCG_LRU_YOUNG; + * 3. The first attempt to reclaim a memcg offlined or below reclaimable size + * threshold, which triggers MEMCG_LRU_TAIL; + * 4. The second attempt to reclaim a memcg offlined or below reclaimable size + * threshold, which triggers MEMCG_LRU_YOUNG; * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. diff --git a/mm/vmscan.c b/mm/vmscan.c index cac38e9cac86..dad4b80b04cd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4626,7 +4626,12 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, }
/* try to scrape all its memory if this memcg was deleted */ - *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total; + if (!mem_cgroup_online(memcg)) { + *nr_to_scan = total; + return false; + } + + *nr_to_scan = total >> sc->priority;
/* * The aging tries to be lazy to reduce the overhead, while the eviction @@ -4747,14 +4752,9 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) bool success; unsigned long scanned = sc->nr_scanned; unsigned long reclaimed = sc->nr_reclaimed; - int seg = lru_gen_memcg_seg(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec);
- /* see the comment on MEMCG_NR_GENS */ - if (!lruvec_is_sizable(lruvec, sc)) - return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; - mem_cgroup_calculate_protection(NULL, memcg);
if (mem_cgroup_below_min(NULL, memcg)) @@ -4762,7 +4762,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
if (mem_cgroup_below_low(NULL, memcg)) { /* see the comment on MEMCG_NR_GENS */ - if (seg != MEMCG_LRU_TAIL) + if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL) return MEMCG_LRU_TAIL;
memcg_memory_event(memcg, MEMCG_LOW); @@ -4778,7 +4778,15 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
flush_reclaim_state(sc);
- return success ? MEMCG_LRU_YOUNG : 0; + if (success && mem_cgroup_online(memcg)) + return MEMCG_LRU_YOUNG; + + if (!success && lruvec_is_sizable(lruvec, sc)) + return 0; + + /* one retry if offlined or too small */ + return lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL ? + MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; }
#ifdef CONFIG_MEMCG