Hello.
I'm just flushing the patches to make memcontrol selftests check the events behavior we had consensus about (test_memcg_low fails).
(test_memcg_reclaim, test_memcg_swap_max fail for me now but it's present even before the refactoring.)
The two bigger changes are: - adjustment of the protected values to make tests succeed with the given tolerance, - both test_memcg_low and test_memcg_min check protection of memory in populated cgroups (actually as per Documentation/admin-guide/cgroup-v2.rst memory.min should not apply to empty cgroups, which is not the case currently. Therefore I unified tests with the populated case in order to to bring more broken tests).
Thanks, Michal
Changes from v2 (https://lore.kernel.org/r/20220518161859.21565-2-mkoutny@suse.com/) - rebased on mm-stable 02e34fff195d3a5f67cbb553795dc109a37d1dcf - collected acked-bys - proper Fixes: tag
Changes from v1 (https://lore.kernel.org/r/20220513171811.730-1-mkoutny@suse.com/) - fixed mis-rebase in compilation fix patch, - added review, ack tags from v1, - applied feedback from v1 (Octave script in git tree), - added one more patch extracting common parts, - rebased on mm-stable bbe832b9db2e.
Michal Koutný (5): selftests: memcg: Fix compilation selftests: memcg: Expect no low events in unprotected sibling selftests: memcg: Adjust expected reclaim values of protected cgroups selftests: memcg: Remove protection from top level memcg selftests: memcg: Factor out common parts of memory.{low,min} tests
MAINTAINERS | 1 + .../selftests/cgroup/memcg_protection.m | 89 +++++++ .../selftests/cgroup/test_memcontrol.c | 247 +++++------------- 3 files changed, 152 insertions(+), 185 deletions(-) create mode 100644 tools/testing/selftests/cgroup/memcg_protection.m
This fixes mis-applied changes from commit 72b1e03aa725 ("cgroup: account for memory_localevents in test_memcg_oom_group_leaf_events()").
Fixes: 72b1e03aa725 ("cgroup: account for memory_localevents in test_memcg_oom_group_leaf_events()") Signed-off-by: Michal Koutný mkoutny@suse.com Reviewed-by: David Vernet void@manifault.com Acked-by: Roman Gushchin roman.gushchin@linux.dev --- .../selftests/cgroup/test_memcontrol.c | 25 +++++++++++-------- 1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 6ab94317c87b..c012db9d07d6 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -1241,7 +1241,16 @@ static int test_memcg_oom_group_leaf_events(const char *root) if (cg_read_key_long(child, "memory.events", "oom_kill ") <= 0) goto cleanup;
- if (cg_read_key_long(parent, "memory.events", "oom_kill ") <= 0) + parent_oom_events = cg_read_key_long( + parent, "memory.events", "oom_kill "); + /* + * If memory_localevents is not enabled (the default), the parent should + * count OOM events in its children groups. Otherwise, it should not + * have observed any events. + */ + if (has_localevents && parent_oom_events != 0) + goto cleanup; + else if (!has_localevents && parent_oom_events <= 0) goto cleanup;
ret = KSFT_PASS; @@ -1349,20 +1358,14 @@ static int test_memcg_oom_group_score_events(const char *root) if (!cg_run(memcg, alloc_anon, (void *)MB(100))) goto cleanup;
- parent_oom_events = cg_read_key_long( - parent, "memory.events", "oom_kill "); - /* - * If memory_localevents is not enabled (the default), the parent should - * count OOM events in its children groups. Otherwise, it should not - * have observed any events. - */ - if ((has_localevents && parent_oom_events == 0) || - parent_oom_events > 0) - ret = KSFT_PASS; + if (cg_read_key_long(memcg, "memory.events", "oom_kill ") != 3) + goto cleanup;
if (kill(safe_pid, SIGKILL)) goto cleanup;
+ ret = KSFT_PASS; + cleanup: if (memcg) cg_destroy(memcg);
This is effectively a revert of commit cdc69458a5f3 ("cgroup: account for memory_recursiveprot in test_memcg_low()"). The case test_memcg_low will fail with memory_recursiveprot until resolved in reclaim code. However, this patch preserves the existing helpers and variables for later uses.
Signed-off-by: Michal Koutný mkoutny@suse.com Reviewed-by: David Vernet void@manifault.com --- tools/testing/selftests/cgroup/test_memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index c012db9d07d6..4924425639b0 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -528,7 +528,7 @@ static int test_memcg_low(const char *root) }
for (i = 0; i < ARRAY_SIZE(children); i++) { - int no_low_events_index = has_recursiveprot ? 2 : 1; + int no_low_events_index = 1;
oom = cg_read_key_long(children[i], "memory.events", "oom "); low = cg_read_key_long(children[i], "memory.events", "low ");
On Tue, May 24, 2022 at 06:29:52PM +0200, Michal Koutny wrote:
This is effectively a revert of commit cdc69458a5f3 ("cgroup: account for memory_recursiveprot in test_memcg_low()"). The case test_memcg_low will fail with memory_recursiveprot until resolved in reclaim code. However, this patch preserves the existing helpers and variables for later uses.
Signed-off-by: Michal Koutný mkoutny@suse.com Reviewed-by: David Vernet void@manifault.com
Acked-by: Roman Gushchin roman.gushchin@linux.dev
The numbers are not easy to derive in a closed form (certainly mere protections ratios do not apply), therefore use a simulation to obtain expected numbers.
Signed-off-by: Michal Koutný mkoutny@suse.com Acked-by: Roman Gushchin roman.gushchin@linux.dev --- MAINTAINERS | 1 + .../selftests/cgroup/memcg_protection.m | 89 +++++++++++++++++++ .../selftests/cgroup/test_memcontrol.c | 29 +++--- 3 files changed, 107 insertions(+), 12 deletions(-) create mode 100644 tools/testing/selftests/cgroup/memcg_protection.m
diff --git a/MAINTAINERS b/MAINTAINERS index 78c57046fa93..b28b6aeb8636 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5029,6 +5029,7 @@ L: linux-mm@kvack.org S: Maintained F: mm/memcontrol.c F: mm/swap_cgroup.c +F: tools/testing/selftests/cgroup/memcg_protection.m F: tools/testing/selftests/cgroup/test_kmem.c F: tools/testing/selftests/cgroup/test_memcontrol.c
diff --git a/tools/testing/selftests/cgroup/memcg_protection.m b/tools/testing/selftests/cgroup/memcg_protection.m new file mode 100644 index 000000000000..051daa3477b6 --- /dev/null +++ b/tools/testing/selftests/cgroup/memcg_protection.m @@ -0,0 +1,89 @@ +% SPDX-License-Identifier: GPL-2.0 +% +% run as: octave-cli memcg_protection.m +% +% This script simulates reclaim protection behavior on a single level of memcg +% hierarchy to illustrate how overcommitted protection spreads among siblings +% (as it depends also on their current consumption). +% +% Simulation assumes siblings consumed the initial amount of memory (w/out +% reclaim) and then the reclaim starts, all memory is reclaimable, i.e. treated +% same. It simulates only non-low reclaim and assumes all memory.min = 0. +% +% Input configurations +% -------------------- +% E number parent effective protection +% n vector nominal protection of siblings set at the given level (memory.low) +% c vector current consumption -,,- (memory.current) + +% example from testcase (values in GB) +E = 50 / 1024; +n = [75 25 0 500 ] / 1024; +c = [50 50 50 0] / 1024; + +% Reclaim parameters +% ------------------ + +% Minimal reclaim amount (GB) +cluster = 32*4 / 2**20; + +% Reclaim coefficient (think as 0.5^sc->priority) +alpha = .1 + +% Simulation parameters +% --------------------- +epsilon = 1e-7; +timeout = 1000; + +% Simulation loop +% --------------- + +ch = []; +eh = []; +rh = []; + +for t = 1:timeout + % low_usage + u = min(c, n); + siblings = sum(u); + + % effective_protection() + protected = min(n, c); % start with nominal + e = protected * min(1, E / siblings); % normalize overcommit + + % recursive protection + unclaimed = max(0, E - siblings); + parent_overuse = sum(c) - siblings; + if (unclaimed > 0 && parent_overuse > 0) + overuse = max(0, c - protected); + e += unclaimed * (overuse / parent_overuse); + endif + + % get_scan_count() + r = alpha * c; % assume all memory is in a single LRU list + + % commit 1bc63fb1272b ("mm, memcg: make scan aggression always exclude protection") + sz = max(e, c); + r .*= (1 - (e+epsilon) ./ (sz+epsilon)); + + % uncomment to debug prints + % e, c, r + + % nothing to reclaim, reached equilibrium + if max(r) < epsilon + break; + endif + + % SWAP_CLUSTER_MAX roundup + r = max(r, (r > epsilon) .* cluster); + % XXX here I do parallel reclaim of all siblings + % in reality reclaim is serialized and each sibling recalculates own residual + c = max(c - r, 0); + + ch = [ch ; c]; + eh = [eh ; e]; + rh = [rh ; r]; +endfor + +t +c, e diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 4924425639b0..dc2c7d6e3572 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -248,7 +248,7 @@ static int cg_test_proc_killed(const char *cgroup) /* * First, this test creates the following hierarchy: * A memory.min = 50M, memory.max = 200M - * A/B memory.min = 50M, memory.current = 50M + * A/B memory.min = 50M * A/B/C memory.min = 75M, memory.current = 50M * A/B/D memory.min = 25M, memory.current = 50M * A/B/E memory.min = 0, memory.current = 50M @@ -259,10 +259,13 @@ static int cg_test_proc_killed(const char *cgroup) * Then it creates A/G and creates a significant * memory pressure in it. * + * Then it checks actual memory usages and expects that: * A/B memory.current ~= 50M - * A/B/C memory.current ~= 33M - * A/B/D memory.current ~= 17M - * A/B/F memory.current ~= 0 + * A/B/C memory.current ~= 29M + * A/B/D memory.current ~= 21M + * A/B/E memory.current ~= 0 + * A/B/F memory.current = 0 + * (for origin of the numbers, see model in memcg_protection.m.) * * After that it tries to allocate more than there is * unprotected memory in A available, and checks @@ -365,10 +368,10 @@ static int test_memcg_min(const char *root) for (i = 0; i < ARRAY_SIZE(children); i++) c[i] = cg_read_long(children[i], "memory.current");
- if (!values_close(c[0], MB(33), 10)) + if (!values_close(c[0], MB(29), 10)) goto cleanup;
- if (!values_close(c[1], MB(17), 10)) + if (!values_close(c[1], MB(21), 10)) goto cleanup;
if (c[3] != 0) @@ -405,7 +408,7 @@ static int test_memcg_min(const char *root) /* * First, this test creates the following hierarchy: * A memory.low = 50M, memory.max = 200M - * A/B memory.low = 50M, memory.current = 50M + * A/B memory.low = 50M * A/B/C memory.low = 75M, memory.current = 50M * A/B/D memory.low = 25M, memory.current = 50M * A/B/E memory.low = 0, memory.current = 50M @@ -417,9 +420,11 @@ static int test_memcg_min(const char *root) * * Then it checks actual memory usages and expects that: * A/B memory.current ~= 50M - * A/B/ memory.current ~= 33M - * A/B/D memory.current ~= 17M - * A/B/F memory.current ~= 0 + * A/B/C memory.current ~= 29M + * A/B/D memory.current ~= 21M + * A/B/E memory.current ~= 0 + * A/B/F memory.current = 0 + * (for origin of the numbers, see model in memcg_protection.m.) * * After that it tries to allocate more than there is * unprotected memory in A available, @@ -512,10 +517,10 @@ static int test_memcg_low(const char *root) for (i = 0; i < ARRAY_SIZE(children); i++) c[i] = cg_read_long(children[i], "memory.current");
- if (!values_close(c[0], MB(33), 10)) + if (!values_close(c[0], MB(29), 10)) goto cleanup;
- if (!values_close(c[1], MB(17), 10)) + if (!values_close(c[1], MB(21), 10)) goto cleanup;
if (c[3] != 0)
The reclaim is triggered by memory limit in a subtree, therefore the testcase does not need configured protection against external reclaim.
Also, correct respective comments
Signed-off-by: Michal Koutný mkoutny@suse.com Acked-by: Roman Gushchin roman.gushchin@linux.dev --- tools/testing/selftests/cgroup/test_memcontrol.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index dc2c7d6e3572..63c6a683a8c1 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -247,7 +247,7 @@ static int cg_test_proc_killed(const char *cgroup)
/* * First, this test creates the following hierarchy: - * A memory.min = 50M, memory.max = 200M + * A memory.min = 0, memory.max = 200M * A/B memory.min = 50M * A/B/C memory.min = 75M, memory.current = 50M * A/B/D memory.min = 25M, memory.current = 50M @@ -257,7 +257,7 @@ static int cg_test_proc_killed(const char *cgroup) * Usages are pagecache, but the test keeps a running * process in every leaf cgroup. * Then it creates A/G and creates a significant - * memory pressure in it. + * memory pressure in A. * * Then it checks actual memory usages and expects that: * A/B memory.current ~= 50M @@ -338,8 +338,6 @@ static int test_memcg_min(const char *root) (void *)(long)fd); }
- if (cg_write(parent[0], "memory.min", "50M")) - goto cleanup; if (cg_write(parent[1], "memory.min", "50M")) goto cleanup; if (cg_write(children[0], "memory.min", "75M")) @@ -407,7 +405,7 @@ static int test_memcg_min(const char *root)
/* * First, this test creates the following hierarchy: - * A memory.low = 50M, memory.max = 200M + * A memory.low = 0, memory.max = 200M * A/B memory.low = 50M * A/B/C memory.low = 75M, memory.current = 50M * A/B/D memory.low = 25M, memory.current = 50M @@ -495,8 +493,6 @@ static int test_memcg_low(const char *root) goto cleanup; }
- if (cg_write(parent[0], "memory.low", "50M")) - goto cleanup; if (cg_write(parent[1], "memory.low", "50M")) goto cleanup; if (cg_write(children[0], "memory.low", "75M"))
The memory protection test setup and runtime is almost equal for memory.low and memory.min cases. It makes modification of the common parts prone to mistakes, since the protections are similar not only in setup but also in principle, factor the common part out.
Past exceptions between the tests: - missing memory.min is fine (kept), - test_memcg_low protected orphaned pagecache (adapted like test_memcg_min and we keep the processes of protected memory running).
The evaluation in two tests is different (OOM of allocator vs low events of protégés), this is kept different.
Signed-off-by: Michal Koutný mkoutny@suse.com --- .../selftests/cgroup/test_memcontrol.c | 199 ++++-------------- 1 file changed, 36 insertions(+), 163 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 63c6a683a8c1..c3d0d5f7b19c 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -190,13 +190,6 @@ static int test_memcg_current(const char *root) return ret; }
-static int alloc_pagecache_50M(const char *cgroup, void *arg) -{ - int fd = (long)arg; - - return alloc_pagecache(fd, MB(50)); -} - static int alloc_pagecache_50M_noexit(const char *cgroup, void *arg) { int fd = (long)arg; @@ -254,7 +247,9 @@ static int cg_test_proc_killed(const char *cgroup) * A/B/E memory.min = 0, memory.current = 50M * A/B/F memory.min = 500M, memory.current = 0 * - * Usages are pagecache, but the test keeps a running + * (or memory.low if we test soft protection) + * + * Usages are pagecache and the test keeps a running * process in every leaf cgroup. * Then it creates A/G and creates a significant * memory pressure in A. @@ -268,15 +263,16 @@ static int cg_test_proc_killed(const char *cgroup) * (for origin of the numbers, see model in memcg_protection.m.) * * After that it tries to allocate more than there is - * unprotected memory in A available, and checks - * checks that memory.min protects pagecache even - * in this case. + * unprotected memory in A available, and checks that: + * a) memory.min protects pagecache even in this case, + * b) memory.low allows reclaiming page cache with low events. */ -static int test_memcg_min(const char *root) +static int test_memcg_protection(const char *root, bool min) { - int ret = KSFT_FAIL; + int ret = KSFT_FAIL, rc; char *parent[3] = {NULL}; char *children[4] = {NULL}; + const char *attribute = min ? "memory.min" : "memory.low"; long c[4]; int i, attempts; int fd; @@ -300,8 +296,10 @@ static int test_memcg_min(const char *root) if (cg_create(parent[0])) goto cleanup;
- if (cg_read_long(parent[0], "memory.min")) { - ret = KSFT_SKIP; + if (cg_read_long(parent[0], attribute)) { + /* No memory.min on older kernels is fine */ + if (min) + ret = KSFT_SKIP; goto cleanup; }
@@ -338,15 +336,15 @@ static int test_memcg_min(const char *root) (void *)(long)fd); }
- if (cg_write(parent[1], "memory.min", "50M")) + if (cg_write(parent[1], attribute, "50M")) goto cleanup; - if (cg_write(children[0], "memory.min", "75M")) + if (cg_write(children[0], attribute, "75M")) goto cleanup; - if (cg_write(children[1], "memory.min", "25M")) + if (cg_write(children[1], attribute, "25M")) goto cleanup; - if (cg_write(children[2], "memory.min", "0")) + if (cg_write(children[2], attribute, "0")) goto cleanup; - if (cg_write(children[3], "memory.min", "500M")) + if (cg_write(children[3], attribute, "500M")) goto cleanup;
attempts = 0; @@ -375,161 +373,26 @@ static int test_memcg_min(const char *root) if (c[3] != 0) goto cleanup;
- if (!cg_run(parent[2], alloc_anon, (void *)MB(170))) - goto cleanup; - - if (!values_close(cg_read_long(parent[1], "memory.current"), MB(50), 3)) - goto cleanup; - - ret = KSFT_PASS; - -cleanup: - for (i = ARRAY_SIZE(children) - 1; i >= 0; i--) { - if (!children[i]) - continue; - - cg_destroy(children[i]); - free(children[i]); - } - - for (i = ARRAY_SIZE(parent) - 1; i >= 0; i--) { - if (!parent[i]) - continue; - - cg_destroy(parent[i]); - free(parent[i]); - } - close(fd); - return ret; -} - -/* - * First, this test creates the following hierarchy: - * A memory.low = 0, memory.max = 200M - * A/B memory.low = 50M - * A/B/C memory.low = 75M, memory.current = 50M - * A/B/D memory.low = 25M, memory.current = 50M - * A/B/E memory.low = 0, memory.current = 50M - * A/B/F memory.low = 500M, memory.current = 0 - * - * Usages are pagecache. - * Then it creates A/G an creates a significant - * memory pressure in it. - * - * Then it checks actual memory usages and expects that: - * A/B memory.current ~= 50M - * A/B/C memory.current ~= 29M - * A/B/D memory.current ~= 21M - * A/B/E memory.current ~= 0 - * A/B/F memory.current = 0 - * (for origin of the numbers, see model in memcg_protection.m.) - * - * After that it tries to allocate more than there is - * unprotected memory in A available, - * and checks low and oom events in memory.events. - */ -static int test_memcg_low(const char *root) -{ - int ret = KSFT_FAIL; - char *parent[3] = {NULL}; - char *children[4] = {NULL}; - long low, oom; - long c[4]; - int i; - int fd; - - fd = get_temp_fd(); - if (fd < 0) - goto cleanup; - - parent[0] = cg_name(root, "memcg_test_0"); - if (!parent[0]) - goto cleanup; - - parent[1] = cg_name(parent[0], "memcg_test_1"); - if (!parent[1]) - goto cleanup; - - parent[2] = cg_name(parent[0], "memcg_test_2"); - if (!parent[2]) - goto cleanup; - - if (cg_create(parent[0])) - goto cleanup; - - if (cg_read_long(parent[0], "memory.low")) - goto cleanup; - - if (cg_write(parent[0], "cgroup.subtree_control", "+memory")) + rc = cg_run(parent[2], alloc_anon, (void *)MB(170)); + if (min && !rc) goto cleanup; - - if (cg_write(parent[0], "memory.max", "200M")) - goto cleanup; - - if (cg_write(parent[0], "memory.swap.max", "0")) - goto cleanup; - - if (cg_create(parent[1])) - goto cleanup; - - if (cg_write(parent[1], "cgroup.subtree_control", "+memory")) - goto cleanup; - - if (cg_create(parent[2])) + else if (!min && rc) { + fprintf(stderr, + "memory.low prevents from allocating anon memory\n"); goto cleanup; - - for (i = 0; i < ARRAY_SIZE(children); i++) { - children[i] = cg_name_indexed(parent[1], "child_memcg", i); - if (!children[i]) - goto cleanup; - - if (cg_create(children[i])) - goto cleanup; - - if (i > 2) - continue; - - if (cg_run(children[i], alloc_pagecache_50M, (void *)(long)fd)) - goto cleanup; }
- if (cg_write(parent[1], "memory.low", "50M")) - goto cleanup; - if (cg_write(children[0], "memory.low", "75M")) - goto cleanup; - if (cg_write(children[1], "memory.low", "25M")) - goto cleanup; - if (cg_write(children[2], "memory.low", "0")) - goto cleanup; - if (cg_write(children[3], "memory.low", "500M")) - goto cleanup; - - if (cg_run(parent[2], alloc_anon, (void *)MB(148))) - goto cleanup; - if (!values_close(cg_read_long(parent[1], "memory.current"), MB(50), 3)) goto cleanup;
- for (i = 0; i < ARRAY_SIZE(children); i++) - c[i] = cg_read_long(children[i], "memory.current"); - - if (!values_close(c[0], MB(29), 10)) - goto cleanup; - - if (!values_close(c[1], MB(21), 10)) - goto cleanup; - - if (c[3] != 0) - goto cleanup; - - if (cg_run(parent[2], alloc_anon, (void *)MB(166))) { - fprintf(stderr, - "memory.low prevents from allocating anon memory\n"); + if (min) { + ret = KSFT_PASS; goto cleanup; }
for (i = 0; i < ARRAY_SIZE(children); i++) { int no_low_events_index = 1; + long low, oom;
oom = cg_read_key_long(children[i], "memory.events", "oom "); low = cg_read_key_long(children[i], "memory.events", "low "); @@ -565,6 +428,16 @@ static int test_memcg_low(const char *root) return ret; }
+static int test_memcg_min(const char *root) +{ + return test_memcg_protection(root, true); +} + +static int test_memcg_low(const char *root) +{ + return test_memcg_protection(root, false); +} + static int alloc_pagecache_max_30M(const char *cgroup, void *arg) { size_t size = MB(50);
On Tue, May 24, 2022 at 06:29:55PM +0200, Michal Koutny wrote:
The memory protection test setup and runtime is almost equal for memory.low and memory.min cases. It makes modification of the common parts prone to mistakes, since the protections are similar not only in setup but also in principle, factor the common part out.
Past exceptions between the tests:
- missing memory.min is fine (kept),
- test_memcg_low protected orphaned pagecache (adapted like test_memcg_min and we keep the processes of protected memory running).
The evaluation in two tests is different (OOM of allocator vs low events of protégés), this is kept different.
Signed-off-by: Michal Koutný mkoutny@suse.com
Acked-by: Roman Gushchin roman.gushchin@linux.dev
linux-kselftest-mirror@lists.linaro.org