Changelog:
v6:
* Rebase on top of latest mm-unstable.
* Fix/improve the in-code documentation of the new list_lru
manipulation functions (patch 1)
v5:
* Replace reference getting with an rcu_read_lock() section for
zswap lru modifications (suggested by Yosry)
* Add a new prep patch that allows mem_cgroup_iter() to return
online cgroup.
* Add a callback that updates pool->next_shrink when the cgroup is
offlined (suggested by Yosry Ahmed, Johannes Weiner)
v4:
* Rename list_lru_add to list_lru_add_obj and __list_lru_add to
list_lru_add (patch 1) (suggested by Johannes Weiner and
Yosry Ahmed)
* Some cleanups on the memcg aware LRU patch (patch 2)
(suggested by Yosry Ahmed)
* Use event interface for the new per-cgroup writeback counters.
(patch 3) (suggested by Yosry Ahmed)
* Abstract zswap's lruvec states and handling into
zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed)
v3:
* Add a patch to export per-cgroup zswap writeback counters
* Add a patch to update zswap's kselftest
* Separate the new list_lru functions into its own prep patch
* Do not start from the top of the hierarchy when encounter a memcg
that is not online for the global limit zswap writeback (patch 2)
(suggested by Yosry Ahmed)
* Do not remove the swap entry from list_lru in
__read_swapcache_async() (patch 2) (suggested by Yosry Ahmed)
* Removed a redundant zswap pool getting (patch 2)
(reported by Ryan Roberts)
* Use atomic for the nr_zswap_protected (instead of lruvec's lock)
(patch 5) (suggested by Yosry Ahmed)
* Remove the per-cgroup zswap shrinker knob (patch 5)
(suggested by Yosry Ahmed)
v2:
* Fix loongarch compiler errors
* Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM
There are currently several issues with zswap writeback:
1. There is only a single global LRU for zswap, making it impossible to
perform worload-specific shrinking - an memcg under memory pressure
cannot determine which pages in the pool it owns, and often ends up
writing pages from other memcgs. This issue has been previously
observed in practice and mitigated by simply disabling
memcg-initiated shrinking:
https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
But this solution leaves a lot to be desired, as we still do not
have an avenue for an memcg to free up its own memory locked up in
the zswap pool.
2. We only shrink the zswap pool when the user-defined limit is hit.
This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed
ahead of time, as this depends on the workload (specifically, on
factors such as memory access patterns and compressibility of the
memory pages).
This patch series solves these issues by separating the global zswap
LRU into per-memcg and per-NUMA LRUs, and performs workload-specific
(i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The
new shrinker does not have any parameter that must be tuned by the
user, and can be opted in or out on a per-memcg basis.
As a proof of concept, we ran the following synthetic benchmark:
build the linux kernel in a memory-limited cgroup, and allocate some
cold data in tmpfs to see if the shrinker could write them out and
improved the overall performance. Depending on the amount of cold data
generated, we observe from 14% to 35% reduction in kernel CPU time used
in the kernel builds.
Domenico Cerasuolo (3):
zswap: make shrinking memcg-aware
mm: memcg: add per-memcg zswap writeback stat
selftests: cgroup: update per-memcg zswap writeback selftest
Nhat Pham (3):
list_lru: allows explicit memcg and NUMA node selection
memcontrol: allows mem_cgroup_iter() to check for onlineness
zswap: shrinks zswap pool based on memory pressure
Documentation/admin-guide/mm/zswap.rst | 7 +
drivers/android/binder_alloc.c | 5 +-
fs/dcache.c | 8 +-
fs/gfs2/quota.c | 6 +-
fs/inode.c | 4 +-
fs/nfs/nfs42xattr.c | 8 +-
fs/nfsd/filecache.c | 4 +-
fs/xfs/xfs_buf.c | 6 +-
fs/xfs/xfs_dquot.c | 2 +-
fs/xfs/xfs_qm.c | 2 +-
include/linux/list_lru.h | 54 ++-
include/linux/memcontrol.h | 9 +-
include/linux/mmzone.h | 2 +
include/linux/vm_event_item.h | 1 +
include/linux/zswap.h | 27 +-
mm/list_lru.c | 48 ++-
mm/memcontrol.c | 20 +-
mm/mmzone.c | 1 +
mm/shrinker.c | 4 +-
mm/swap.h | 3 +-
mm/swap_state.c | 26 +-
mm/vmscan.c | 26 +-
mm/vmstat.c | 1 +
mm/workingset.c | 4 +-
mm/zswap.c | 426 +++++++++++++++++---
tools/testing/selftests/cgroup/test_zswap.c | 74 ++--
26 files changed, 629 insertions(+), 149 deletions(-)
base-commit: 40b487ae2620fc9187fee68b09d2cb275de0d60e
--
2.34.1
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first move some pmu common code from vpmu_counter_access to lib/
which can be used by pmu_event_filter_test. Then implements the test itself.
Shaoqin Huang (3):
KVM: selftests: aarch64: Make the [create|destroy]_vpmu_vm() can be
reused
KVM: selftests: aarch64: Move the pmu helper function into lib/
KVM: selftests: aarch64: Introduce pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 2 +
.../kvm/aarch64/pmu_event_filter_test.c | 227 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 218 ++---------------
.../selftests/kvm/include/aarch64/vpmu.h | 139 +++++++++++
.../testing/selftests/kvm/lib/aarch64/vpmu.c | 74 ++++++
5 files changed, 466 insertions(+), 194 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
create mode 100644 tools/testing/selftests/kvm/lib/aarch64/vpmu.c
--
2.40.1
When linking statically, libraries may require other dependencies to be
included to ld flags. In particular, libelf may require libzstd. Use
pkg-config to determine such dependencies.
V4 -> V5: Introduced variables LIBELF_CFLAGS and LIBELF_LIBS.
(Daniel Borkmann)
Added patch "selftests/bpf: Choose pkg-config for the target".
V3 -> V4: Added "2> /dev/null".
V2 -> V3: Added missing "echo".
V1 -> V2: Implemented fallback, referring to HOSTPKG_CONFIG.
Akihiko Odaki (3):
selftests/bpf: Choose pkg-config for the target
selftests/bpf: Override PKG_CONFIG for static builds
selftests/bpf: Use pkg-config for libelf
tools/testing/selftests/bpf/Makefile | 14 +++++++++-----
tools/testing/selftests/bpf/README.rst | 2 +-
2 files changed, 10 insertions(+), 6 deletions(-)
--
2.43.0
From: angquan yu <angquan21(a)gmail.com>
This commit addresses compiler warnings in lam.c related to the usage
of non-literal format strings without format arguments in the
'run_test' function.
Warnings fixed:
- Resolved warnings indicating that 'ksft_test_result_skip' and
'ksft_test_result' were called with 't->msg' as a format string without
accompanying format arguments.
Changes made:
- Modified the calls to 'ksft_test_result_skip' and 'ksft_test_result'
to explicitly include a format specifier ("%s") for 't->msg'.
- This ensures that the string is safely treated as a format argument,
adhering to safer coding practices and resolving the compiler warnings.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/x86/lam.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 8f9b06d9c..215b8150b 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -817,7 +817,7 @@ static void run_test(struct testcases *test, int count)
/* return 3 is not support LA57, the case should be skipped */
if (ret == 3) {
- ksft_test_result_skip(t->msg);
+ ksft_test_result_skip("%s", t->msg);
continue;
}
@@ -826,7 +826,7 @@ static void run_test(struct testcases *test, int count)
else
ret = !(t->expected);
- ksft_test_result(ret, t->msg);
+ ksft_test_result(ret, "%s", t->msg);
}
}
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
This commit fixes a compiler warning in the file
x86_64/nx_huge_pages_test.c, which was caused by improper
macro expansion of '__TEST_REQUIRE'.
Warning addressed:
- The warning was triggered by the expansion of the '__TEST_REQUIRE'
macro, indicating a potential issue in how the macro was being
used or expanded.
Changes made:
- Modified the usage of the '__TEST_REQUIRE' macro to ensure proper
expansion. This involved explicitly passing the expected magic token
(MAGIC_TOKEN) and a descriptive error message to the macro.
- The fix enhances clarity in the macro usage and ensures that
the compiler correctly interprets the intended logic, thereby
resolving the warning.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
index 18ac5c195..323ede6b6 100644
--- a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
+++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
@@ -259,7 +259,8 @@ int main(int argc, char **argv)
__TEST_REQUIRE(token == MAGIC_TOKEN,
"This test must be run with the magic token %d.\n"
"This is done by nx_huge_pages_test.sh, which\n"
- "also handles environment setup for the test.");
+ "also handles environment setup for the test.",
+ MAGIC_TOKEN);
run_test(reclaim_period_ms, false, reboot_permissions);
run_test(reclaim_period_ms, true, reboot_permissions);
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
This commit addresses compiler warnings in lam.c related to the usage
of non-literal format strings without format arguments in the
'run_test' function.
Warnings fixed:
- Resolved warnings indicating that 'ksft_test_result_skip' and
'ksft_test_result' were called with 't->msg' as a format string without
accompanying format arguments.
Changes made:
- Modified the calls to 'ksft_test_result_skip' and 'ksft_test_result'
to explicitly include a format specifier ("%s") for 't->msg'.
- This ensures that the string is safely treated as a format argument,
adhering to safer coding practices and resolving the compiler warnings.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/x86/lam.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 8f9b06d9c..215b8150b 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -817,7 +817,7 @@ static void run_test(struct testcases *test, int count)
/* return 3 is not support LA57, the case should be skipped */
if (ret == 3) {
- ksft_test_result_skip(t->msg);
+ ksft_test_result_skip("%s", t->msg);
continue;
}
@@ -826,7 +826,7 @@ static void run_test(struct testcases *test, int count)
else
ret = !(t->expected);
- ksft_test_result(ret, t->msg);
+ ksft_test_result(ret, "%s", t->msg);
}
}
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
This commit resolves a compiler warning regardingthe
use of non-literal format strings in breakpoint_test.c.
The functions `ksft_test_result_pass` and `ksft_test_result_fail`
were previously called with a variable `msg` directly, which could
potentially lead to format string vulnerabilities.
Changes made:
- Modified the calls to `ksft_test_result_pass` and `ksft_test_result_fail`
by adding a "%s" format specifier. This explicitly declares `msg` as a
string argument, adhering to safer coding practices and resolving
the compiler warning.
This change does not affect the functional behavior of the code but ensures
better code safety and compliance with recommended C programming standards.
The previous warning is "breakpoint_test.c:287:17:
warning: format not a string literal and no format arguments
[-Wformat-security]
287 | ksft_test_result_pass(msg);
| ^~~~~~~~~~~~~~~~~~~~~
breakpoint_test.c:289:17: warning: format not a string literal
and no format arguments [-Wformat-security]
289 | ksft_test_result_fail(msg);
| "
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test.c b/tools/testing/selftests/breakpoints/breakpoint_test.c
index 3266cc929..d46962a24 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test.c
@@ -284,9 +284,9 @@ static void check_success(const char *msg)
nr_tests++;
if (ret)
- ksft_test_result_pass(msg);
+ ksft_test_result_pass("%s", msg);
else
- ksft_test_result_fail(msg);
+ ksft_test_result_fail("%s", msg);
}
static void launch_instruction_breakpoints(char *buf, int local, int global)
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value
of a write call was being ignored. This call was partof a conditional
debugging block (if (0) { ... }), which meant it would neveractually
execute.
This patch removes the unused debug write call. This cleanup resolves
the compi>warning about ignoring the result of write declared with
the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability of
the function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function
‘test_proc_pid_statm’ :proc-empty-vm.c:385:17:
warning: ignoring return value of ‘write’
declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 5e7020630..d231e61e4 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -383,8 +383,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv <= sizeof(buf));
if (0) {
ssize_t written = write(1, buf, rv);
+
if (written == -1) {
perror("write failed to /proc/${pid}");
+ return EXIT_FAILURE;
}
}
--
2.39.2
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit f40bfd1679446b22d321e64a1fa98b7d07d2be08 ]
This is a preparatory change. A follow-up patch "bpf: verify callbacks
as if they are called unknown number of times" changes logic for
callbacks handling. While previously callbacks were verified as a
single function call, new scheme takes into account that callbacks
could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
for (int i = 0; i < 1000; i++) {
bpf_loop(nr_loops, empty_callback, NULL, 0);
__sync_add_and_fetch(&hits, nr_loops);
}
return 0;
}
W/o callbacks change verifier sees it as a 1000 calls to
empty_callback(). However, with callbacks change things become
exponential:
- i=0: state exploring empty_callback is scheduled with i=0 (a);
- i=1: state exploring empty_callback is scheduled with i=1;
...
- i=999: state exploring empty_callback is scheduled with i=999;
- state (a) is popped from stack;
- i=1: state exploring empty_callback is scheduled with i=1;
...
Avoid this issue by rewriting outer loop as bpf_loop().
Unfortunately, this adds a function call to a loop at runtime, which
negatively affects performance:
throughput latency
before: 149.919 ± 0.168 M ops/s, 6.670 ns/op
after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/bpf_loop_bench.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
index 4ce76eb064c41..d461746fd3c1e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
+++ b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
@@ -15,13 +15,16 @@ static int empty_callback(__u32 index, void *data)
return 0;
}
+static int outer_loop(__u32 index, void *data)
+{
+ bpf_loop(nr_loops, empty_callback, NULL, 0);
+ __sync_add_and_fetch(&hits, nr_loops);
+ return 0;
+}
+
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
- for (int i = 0; i < 1000; i++) {
- bpf_loop(nr_loops, empty_callback, NULL, 0);
-
- __sync_add_and_fetch(&hits, nr_loops);
- }
+ bpf_loop(1000, outer_loop, NULL, 0);
return 0;
}
--
2.42.0