On 4/4/25 2:13 PM, Johannes Weiner wrote:
On Fri, Apr 04, 2025 at 01:25:33PM -0400, Waiman Long wrote:
On 4/4/25 1:12 PM, Tejun Heo wrote:
Hello,
On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long wrote: ...
The simple and naive fix of changing the operator to ">", however, changes the memory reclaim behavior which can lead to other failures as low events are needed to facilitate memory reclaim. So we can't do that without some relatively riskier changes in memory reclaim.
I'm doubtful using ">" would change reclaim behavior in a meaningful way and that'd be more straightforward. What do mm people think?
The knob documentation uses "within low" and "above low" to distinguish whether you are protected or not, so at least from a code clarity pov, >= makes more sense to me: if your protection is N and you use exactly N, you're considered protected.
That also means that by definition an empty cgroup is protected. It's not in excess of its protection. The test result isn't wrong.
The real weirdness is issuing a "low reclaim" event when no reclaim is going to happen*.
The patch effectively special cases "empty means in excess" to avoid the event and fall through to reclaim, which then does nothing as a result of its own scan target calculations. That seems convoluted.
Why not skip empty cgroups before running inapplicable checks?
diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..260ab238ec22 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5963,6 +5963,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) mem_cgroup_calculate_protection(target_memcg, memcg);
if (!mem_cgroup_usage(memcg, false))
continue;
- if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection.
Yes, that should take care of the memcg with no task case.
I haven't looked deeply into why that is the case, but test_memcg_low/min tests had other failures when I made this change.
It surprises me as well that it makes any practical difference.
I looked at it again and failure is the same expected memory.current check in test_memcontrol. If I remove the equal sign, I got errors like:
values_close: child 0 = 8339456, 29MB = 30408704 failed with err = 21 not ok 1 test_memcg_min
So the test is expecting memory.current to have around 29MB, but it got a lot less (~8MB) in this case. Before removing the equality sign, I usually got about 25 MB and above for child 0. That is a pretty big change in behavior, so I didn't make it.
- Waiman points out that the weirdness is seeing low events without having a low configured. Eh, this isn't really true with recursive propagation; you may or may not have an elow depending on parental configuration and sibling behavior.
Do you mind if we just don't update the low event count if low isn't set, but leave the rest the same like
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 91721c8862c3..48a8bfa7d337 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -659,21 +659,25 @@ static inline bool mem_cgroup_unprotected(struct mem_cgro> static inline bool mem_cgroup_below_low(struct mem_cgroup *target, struct mem_cgroup *memcg) { + unsigned long elow; + if (mem_cgroup_unprotected(target, memcg)) return false;
- return READ_ONCE(memcg->memory.elow) >= - page_counter_read(&memcg->memory); + elow = READ_ONCE(memcg->memory.elow); + return elow && (page_counter_read(&memcg->memory) <= elow); }
static inline bool mem_cgroup_below_min(struct mem_cgroup *target, struct mem_cgroup *memcg) { + unsigned long emin; + if (mem_cgroup_unprotected(target, memcg)) return false;
- return READ_ONCE(memcg->memory.emin) >= - page_counter_read(&memcg->memory); + emin = READ_ONCE(memcg->memory.emin); + return emin && (page_counter_read(&memcg->memory) <= emin); }
void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg); diff --git a/mm/vmscan.c b/mm/vmscan.c index 77d015d5db0c..e8c1838c7962 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4827,7 +4827,8 @@ static int shrink_one(struct lruvec *lruvec, struct scan_> if (READ_ONCE(lruvec->lrugen.seg) != MEMCG_LRU_TAIL) return MEMCG_LRU_TAIL;
- memcg_memory_event(memcg, MEMCG_LOW); + if (memcg->memory.low) + memcg_memory_event(memcg, MEMCG_LOW); }
success = try_to_shrink_lruvec(lruvec, sc); @@ -5902,6 +5903,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct s>
mem_cgroup_calculate_protection(target_memcg, memcg);
+ if (!mem_cgroup_usage(memcg, false)) + continue; + if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection. @@ -5919,7 +5923,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct s> sc->memcg_low_skipped = 1; continue; } - memcg_memory_event(memcg, MEMCG_LOW); + if (memcg->memory.low) + memcg_memory_event(memcg, MEMCG_LOW); }
reclaimed = sc->nr_reclaimed;
Cheers, Longman