v3:
- pull forward to v6.8
- style and small fixups recommended by jcameron
- update syscall number (will do all archs when RFC tag drops)
- update for new folio code
- added OCP link to device-tracked address hotness proposal
- kept void* over __u64 simply because it integrates cleanly with
existing migration code. If there's strong opinions, I can refactor.
This patch set is a proposal for a syscall analogous to move_pages,
that migrates pages between NUMA nodes using physical addressing.
The intent is to better enable user-land system-wide memory tiering
as CXL devices begin to provide memory resources on the PCIe bus.
For example, user-land software which is making decisions based on
data sources which expose physical address information no longer
must convert that information to virtual addressing to act upon it
(see background for info on how physical addresses are acquired).
The syscall requires CAP_SYS_ADMIN, since physical address source
information is typically protected by the same (or CAP_SYS_NICE).
This patch set broken into 3 patches:
1) refactor of existing migration code for code reuse
2) The sys_move_phys_pages system call.
3) ktest of the syscall
The sys_move_phys_pages system call validates the page may be
migrated by checking migratable-status of each vma mapping the page,
and the intersection of cpuset policies each vma's task.
Background:
Userspace job schedulers, memory managers, and tiering software
solutions depend on page migration syscalls to reallocate resources
across NUMA nodes. Currently, these calls enable movement of memory
associated with a specific PID. Moves can be requested in coarse,
process-sized strokes (as with migrate_pages), and on specific virtual
pages (via move_pages).
However, a number of profiling mechanisms provide system-wide information
that would benefit from a physical-addressing version move_pages.
There are presently at least 4 ways userland can acquire physical
address information for use with this interface, and 1 hardware offload
mechanism being proposed by opencompute.
1) /proc/pid/pagemap: can be used to do page table translations.
This is only really useful for testing, and the ktest was
written using this functionality.
2) X86: IBS (AMD) and PEBS (Intel) can be configured to return physical
and/or vitual address information.
3) zoneinfo: /proc/zoneinfo exposes the start PFN of zones
4) /sys/kernel/mm/page_idle: A way to query whether a PFN is idle.
So long as the page size is known, this can be used to identify
system-wide idle pages that could be migrated to lower tiers.
https://docs.kernel.org/admin-guide/mm/idle_page_tracking.html
5) CXL Offloaded Hotness Monitoring (Proposed): a CXL memory device
may provide hot/cold information about its memory. For example,
it may report the hottest device addresses (0-based) or a physical
address (if it has access to decoders for convert bases).
DPA can be cheaply converted to HPA by combining it with data
exposed by /sys/bus/cxl/ information (region address bases).
See: https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Information from these sources facilitates systemwide resource management,
but with the limitations of migrate_pages and move_pages applying to
individual tasks, their outputs must be converted back to virtual addresses
and re-associated with specific PIDs.
Doing this reverse-translation outside of the kernel requires considerable
space and compute, and it will have to be performed again by the existing
system calls. Much of this work can be avoided if the pages can be
migrated directly with physical memory addressing.
Gregory Price (3):
mm/migrate: refactor add_page_for_migration for code re-use
mm/migrate: Create move_phys_pages syscall
ktest: sys_move_phys_pages ktest
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 5 +
include/uapi/asm-generic/unistd.h | 8 +-
kernel/sys_ni.c | 1 +
mm/migrate.c | 288 ++++++++++++++++++++----
tools/include/uapi/asm-generic/unistd.h | 8 +-
tools/testing/selftests/mm/migration.c | 99 ++++++++
8 files changed, 370 insertions(+), 41 deletions(-)
--
2.39.1
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908..7989ec608454 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -193,6 +202,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -204,6 +214,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.34.1
Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.
The series adding SNC support to the kernel is currently in review [1]
but the selftests for resctrl need NUMA-aware adjustments to use these
changes. Issues with resctrl selftests not working properly with SNC
enabled were originally reported by Shaopeng Tan [2][3] and the
following series resolves them.
The main concept currently missing from resctrl selftests is that while
resctrl tracks memory accesses on a single NUMA node (which normally is
the same as the CPU socket) on machines with SNC enabled memory accesses
can leak outside of the local NUMA node and into other NUMA nodes on the
same socket. In that case resctrl could report a diminished value in one
of its monitoring technologies: Cache Monitoring Technology (CMT) or
Memory Bandwidth Monitoring (MBM) .
Implemented solutions for both CMT and MBM follow the same idea which is
to simply sum values reported by different NUMA nodes for a single
Resource Monitoring ID (RMID).
Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.
Series applies cleanly on kselftest/next.
[1] https://lore.kernel.org/all/20240228112215.8044-tony.luck@intel.com/
[2] https://lore.kernel.org/all/TYAPR01MB6330B9B17686EF426D2C3F308B25A@TYAPR01M…
[3] https://lore.kernel.org/lkml/TYAPR01MB6330A4EB3633B791939EA45E8B39A@TYAPR01…
Maciej Wieczor-Retman (4):
selftests/resctrl: Adjust effective L3 cache size with SNC enabled
selftests/resctrl: SNC support for CMT
selftests/resctrl: SNC support for MBM
selftests/resctrl: Adjust SNC support messages
tools/testing/selftests/resctrl/cache.c | 17 ++-
tools/testing/selftests/resctrl/cat_test.c | 2 +-
tools/testing/selftests/resctrl/cmt_test.c | 6 +-
tools/testing/selftests/resctrl/mba_test.c | 3 +-
tools/testing/selftests/resctrl/mbm_test.c | 4 +-
tools/testing/selftests/resctrl/resctrl.h | 13 +-
tools/testing/selftests/resctrl/resctrl_val.c | 46 ++++---
tools/testing/selftests/resctrl/resctrlfs.c | 128 +++++++++++++++++-
8 files changed, 185 insertions(+), 34 deletions(-)
--
2.44.0
Hi,
With the commit v6.8-11167-g4438a810f396 in vanilla torvalds tree, there seem to be problems with
the icmp_redirect.sh tests.
The iproute2-next tools were used, commit 7a6d30c95da9.
# timeout set to 3600
# selftests: net: icmp_redirect.sh
#
# ###########################################################################
# Legacy routing
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Legacy routing with VRF
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Routing with nexthop objects
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Routing with nexthop objects and VRF
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# Tests passed: 28
# Tests failed: 12
# Tests xfailed: 0
not ok 45 selftests: net: icmp_redirect.sh # exit=1
These errors are not introduced with this commit, but were already present at least in 6.8-rc7.
Hope this helps.
Best regards,
Mirsad Todorovac
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.
The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
---
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ if (!cgroup_tryget(cgrp))
+ cgrp = NULL;
+ rcu_read_unlock();
+ return cgrp;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2598,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
base-commit: 4c8644f86c854c214aaabbcc24a27fa4c7e6a951
--
2.40.1
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v3:
- Changing from input() to stdin
- Add checking for non-regular files
- Spacing fix
- Small printing fix
tools/testing/kunit/kunit.py | 54 +++++++++++++++++++++++++-----------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..641b8ca83e3e 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,40 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
- sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
- else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
- kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ parsed_files = cli_args.files # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if not parsed_files:
+ parsed_files.append('/dev/stdin')
+ elif len(parsed_files) == 1 and parsed_files[0] == "debugfs":
+ parsed_files.pop()
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
+ if not parsed_files:
+ print("No files found.")
+
+ for file in parsed_files:
+ if os.path.isdir(file):
+ print(f'Ignoring directory "{file}"')
+ elif os.path.exists(file):
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
+ kunit_output = f.read().splitlines()
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+ else:
+ print(f'Could not find "{file}"')
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ if not request.raw_output:
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +590,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.291.gc1ea87d7ee-goog
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v2:
- Fixed bug with input from command line. I changed this to use
input(). Daniel, let me know if this works for you.
- Add more specific warning messages
tools/testing/kunit/kunit.py | 56 +++++++++++++++++++++++++-----------
1 file changed, 40 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..1aa3d736d80c 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,42 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
- sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
- else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
- kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ parsed_files = cli_args.files # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if not parsed_files:
+ parsed_files.append(input("File path: "))
+
+ if parsed_files[0] == "debugfs" and len(parsed_files) == 1:
+ parsed_files.pop()
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
+
+ if not parsed_files:
+ print("No files found.")
+
+ for file in parsed_files:
+ if os.path.isfile(file):
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
+ kunit_output = f.read().splitlines()
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+ elif os.path.isdir(file):
+ print("Ignoring directory ", file)
+ else:
+ print("Could not find ", file)
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ if not request.raw_output:
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +592,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.278.ge034bb2e1d-goog