cpu.max selftests (both the normal one and the nested one) test the working of throttling by setting up cpu.max, running a cpu hog process for a specified duration, and comparing usage_usec as reported by cpu.stat with the duration of the cpu hog: they should be far enough.
Currently, this is done by using values_close, which has two problems:
1. Semantic: values_close is used with an error percentage of 95%, which one will not expect on seeing "values close". The intent it's actually going for is "values far".
2. Accuracy: the tests can pass even if usage_usec is upto around double the expected amount. That's too high of a margin for usage_usec.
Overall, this patchset improves the readability and accuracy of the cpu.max tests.
Signed-off-by: Shashank Balaji shashank.mahadasyam@sony.com --- Shashank Balaji (2): selftests/cgroup: rename `expected` to `duration` in cpu.max tests selftests/cgroup: better bound in cpu.max tests
tools/testing/selftests/cgroup/test_cpu.c | 42 ++++++++++++++++++------------- 1 file changed, 24 insertions(+), 18 deletions(-) --- base-commit: 66701750d5565c574af42bef0b789ce0203e3071 change-id: 20250227-kselftest-cgroup-fix-cpu-max-56619928e99b
Best regards,
usage_seconds is renamed to duration_seconds and expected_usage_usec is renamed to duration_usec to better reflect the meaning of those variables: they're the duration for which the cpu hog runs for as per wall clock time.
Using `usage` for this purpose conflicts with the meaning of `usage` as per cpu.stat. In the cpu.stat case, `usage` is the duration for which the cgroup processes get to run for. Since in the cpu.max tests (both the normal one and the nested test), the cpu hog is set to use the wall clock time, because of throttling, the usage as per cpu.stat will be lower than the duration for which the cpu hog was set to run for.
Now it should ring an alarm to see `values_close` being called on usage_usec and duration_usec, because they are not supposed to be close! This is fixed in the next patch.
No functional changes.
Signed-off-by: Shashank Balaji shashank.mahadasyam@sony.com --- tools/testing/selftests/cgroup/test_cpu.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_cpu.c b/tools/testing/selftests/cgroup/test_cpu.c index a2b50af8e9eeede0cf61d8394300cac02ccaf005..26b0df338505526cc0c5de8f4179b8ec9bad43d7 100644 --- a/tools/testing/selftests/cgroup/test_cpu.c +++ b/tools/testing/selftests/cgroup/test_cpu.c @@ -646,8 +646,8 @@ static int test_cpucg_max(const char *root) { int ret = KSFT_FAIL; long usage_usec, user_usec; - long usage_seconds = 1; - long expected_usage_usec = usage_seconds * USEC_PER_SEC; + long duration_seconds = 1; + long duration_usec = duration_seconds * USEC_PER_SEC; char *cpucg;
cpucg = cg_name(root, "cpucg_test"); @@ -663,7 +663,7 @@ static int test_cpucg_max(const char *root) struct cpu_hog_func_param param = { .nprocs = 1, .ts = { - .tv_sec = usage_seconds, + .tv_sec = duration_seconds, .tv_nsec = 0, }, .clock_type = CPU_HOG_CLOCK_WALL, @@ -676,10 +676,10 @@ static int test_cpucg_max(const char *root) if (user_usec <= 0) goto cleanup;
- if (user_usec >= expected_usage_usec) + if (user_usec >= duration_usec) goto cleanup;
- if (values_close(usage_usec, expected_usage_usec, 95)) + if (values_close(usage_usec, duration_usec, 95)) goto cleanup;
ret = KSFT_PASS; @@ -699,8 +699,8 @@ static int test_cpucg_max_nested(const char *root) { int ret = KSFT_FAIL; long usage_usec, user_usec; - long usage_seconds = 1; - long expected_usage_usec = usage_seconds * USEC_PER_SEC; + long duration_seconds = 1; + long duration_usec = duration_seconds * USEC_PER_SEC; char *parent, *child;
parent = cg_name(root, "cpucg_parent"); @@ -723,7 +723,7 @@ static int test_cpucg_max_nested(const char *root) struct cpu_hog_func_param param = { .nprocs = 1, .ts = { - .tv_sec = usage_seconds, + .tv_sec = duration_seconds, .tv_nsec = 0, }, .clock_type = CPU_HOG_CLOCK_WALL, @@ -736,10 +736,10 @@ static int test_cpucg_max_nested(const char *root) if (user_usec <= 0) goto cleanup;
- if (user_usec >= expected_usage_usec) + if (user_usec >= duration_usec) goto cleanup;
- if (values_close(usage_usec, expected_usage_usec, 95)) + if (values_close(usage_usec, duration_usec, 95)) goto cleanup;
ret = KSFT_PASS;
The cpu.max test (both the normal one and the nested one) setup cpu.max with 1000 us runtime and the default period (100,000 us). A cpu hog is run for a duration of 1s as per wall clock time. This corresponds to 10 periods, hence an expected usage of 10,000 us. We want the measured usage (as per cpu.stat) to be close to 10,000 us. Enforce this bound correctly.
Previously, this approximate equality test was done by `!values_close(usage_usec, duration_usec, 95)`: if the absolute difference between usage_usec and duration_usec is greater than 95% of their sum, then we pass. This is problematic for two reasons:
1. Semantics: When one sees `values_close` they expect the error percentage to be some small number, not 95. The intent behind using `values_close` is lost by using a high error percent such as 95. The intent it's actually going for is "values far".
2. Bound too wide: The condition translates to the following expression:
|usage_usec - duration_usec| > (usage_usec + duration_usec)*0.95
0.05*duration_usec > 1.95*usage_usec (usage < duration)
usage_usec < 0.0257*duration_usec = 25,641 us
So, this condition passes as long as usage_usec is lower than 25,641 us, while all we want is for it to be close to 10,000 us.
To address these issues, the condition is changed to `labs(usage_usec - expected_usage_usec) < 2000` meaning pass. Now the meaning is much clearer. `labs` is used instead of `values_close` because we don't expect the error in usage_usec compared to expected_usage_usec to scale with either of the terms. The error is because of the cpu hog process running for slightly longer than the duration. So, using a proportional error estimate, such as `values_close`, does not make sense. The maximum tolerable error is set to 2000 us because on running this test 10 times, the maximum `usage_usec` observed was 11,513 us, which corresponds to an error of 1513 us.
user_usec is removed because it will always be less than usage_usec. usage_usec is what really represents the throttling.
Signed-off-by: Shashank Balaji shashank.mahadasyam@sony.com --- tools/testing/selftests/cgroup/test_cpu.c | 34 ++++++++++++++++++------------- 1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_cpu.c b/tools/testing/selftests/cgroup/test_cpu.c index 26b0df338505526cc0c5de8f4179b8ec9bad43d7..fcef90d2948e1344b7741214a0cdd10609069624 100644 --- a/tools/testing/selftests/cgroup/test_cpu.c +++ b/tools/testing/selftests/cgroup/test_cpu.c @@ -645,9 +645,8 @@ test_cpucg_nested_weight_underprovisioned(const char *root) static int test_cpucg_max(const char *root) { int ret = KSFT_FAIL; - long usage_usec, user_usec; + long usage_usec, expected_usage_usec; long duration_seconds = 1; - long duration_usec = duration_seconds * USEC_PER_SEC; char *cpucg;
cpucg = cg_name(root, "cpucg_test"); @@ -672,14 +671,18 @@ static int test_cpucg_max(const char *root) goto cleanup;
usage_usec = cg_read_key_long(cpucg, "cpu.stat", "usage_usec"); - user_usec = cg_read_key_long(cpucg, "cpu.stat", "user_usec"); - if (user_usec <= 0) + if (usage_usec <= 0) goto cleanup;
- if (user_usec >= duration_usec) - goto cleanup; + /* + * Since the cpu hog is set to run as per wall clock time, it's expected to + * run for 10 periods (duration_usec/default_period_usec), and in each + * period, it's throttled to run for 1000 usec. So its expected usage is + * 1000 * 10 = 10000 usec. + */ + expected_usage_usec = 10000;
- if (values_close(usage_usec, duration_usec, 95)) + if (labs(usage_usec - expected_usage_usec) > 2000) goto cleanup;
ret = KSFT_PASS; @@ -698,9 +701,8 @@ static int test_cpucg_max(const char *root) static int test_cpucg_max_nested(const char *root) { int ret = KSFT_FAIL; - long usage_usec, user_usec; + long usage_usec, expected_usage_usec; long duration_seconds = 1; - long duration_usec = duration_seconds * USEC_PER_SEC; char *parent, *child;
parent = cg_name(root, "cpucg_parent"); @@ -732,14 +734,18 @@ static int test_cpucg_max_nested(const char *root) goto cleanup;
usage_usec = cg_read_key_long(child, "cpu.stat", "usage_usec"); - user_usec = cg_read_key_long(child, "cpu.stat", "user_usec"); - if (user_usec <= 0) + if (usage_usec <= 0) goto cleanup;
- if (user_usec >= duration_usec) - goto cleanup; + /* + * Since the cpu hog is set to run as per wall clock time, it's expected to + * run for 10 periods (duration_usec/default_period_usec), and in each + * period, it's throttled to run for 1000 usec. So its expected usage is + * 1000 * 10 = 10000 usec. + */ + expected_usage_usec = 10000;
- if (values_close(usage_usec, duration_usec, 95)) + if (labs(usage_usec - expected_usage_usec) > 2000) goto cleanup;
ret = KSFT_PASS;
linux-kselftest-mirror@lists.linaro.org