- Linux-kselftest-mirror - lists.linaro.org

[PATCH net-next v5] ipv6: add `force_forwarding` sysctl to enable per-interface forwarding

by Gabriel Goller

It is currently impossible to enable ipv6 forwarding on a per-interface basis like in ipv4. To enable forwarding on an ipv6 interface we need to enable it on all interfaces and disable it on the other interfaces using a netfilter rule. This is especially cumbersome if you have lots of interface and only want to enable forwarding on a few. According to the sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding for all interfaces, while the interface-specific `net.ipv6.conf.<interface>.forwarding` configures the interface Host/Router configuration. Introduce a new sysctl flag `force_forwarding`, which can be set on every interface. The ip6_forwarding function will then check if the global forwarding flag OR the force_forwarding flag is active and forward the packet. To preserver backwards-compatibility reset the flag (on all interfaces) to 0 if the net.ipv6.conf.all.forwarding flag is set to 0. Add a short selftest that checks if a packet gets forwarded with and without `force_forwarding`. [0]: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt Signed-off-by: Gabriel Goller <g.goller(a)proxmox.com> --- v5: * update conf/all/forwarding docs * simplified backwards-compat comment * remove ASSERT_RTNL as it's guaranteed by __in6_dev_get_rtnl_net() already * cange ip6_forward logic so that it doesn't depend on the idev existing * move WRITE_ONCE inside device lock v4: https://lore.kernel.org/netdev/20250703160154.560239-1-g.goller@proxmox.com/ * actually write the sysctl value to the table * use ASSERT_RTNL() when forwarding the sysctl change * remove useless comments in function body * simplify forwarding and force_forwarding check in ip6_output.c * fix code backticks in Documentation (double instead of single) * add selftests v3: https://lore.kernel.org/netdev/20250702074619.139031-1-g.goller@proxmox.com/ * remove forwarding=0 setting force_forwarding=0 globally. * add min and max (0 and 1) value to sysctl. v2: https://lore.kernel.org/netdev/20250701140423.487411-1-g.goller@proxmox.com/ * rename from `do_forwarding` to `force_forwarding`. * add global `force_forwarding` flag which will enable `force_forwarding` on every interface like the `ipv4.all.forwarding` flag. * `forwarding`=0 will disable global and per-interface `force_forwarding`. * export option as NETCONFA_FORCE_FORWARDING. v1: https://lore.kernel.org/netdev/20250702074619.139031-1-g.goller@proxmox.com/ Documentation/networking/ip-sysctl.rst | 9 +- include/linux/ipv6.h | 1 + include/uapi/linux/ipv6.h | 1 + include/uapi/linux/netconf.h | 1 + include/uapi/linux/sysctl.h | 1 + net/ipv6/addrconf.c | 83 ++++++++++++++ net/ipv6/ip6_output.c | 3 +- tools/testing/selftests/net/Makefile | 1 + .../selftests/net/ipv6_force_forwarding.sh | 105 ++++++++++++++++++ 9 files changed, 202 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/net/ipv6_force_forwarding.sh diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 0f1251cce314..6d92bae0257a 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -2281,8 +2281,8 @@ conf/all/disable_ipv6 - BOOLEAN conf/all/forwarding - BOOLEAN Enable global IPv6 forwarding between all interfaces. - IPv4 and IPv6 work differently here; e.g. netfilter must be used - to control which interfaces may forward packets and which not. + IPv4 and IPv6 work differently here; the ``force_forwarding`` flag must + be used to control which interfaces may forward packets. This also sets all interfaces' Host/Router setting 'forwarding' to the specified value. See below for details. @@ -2292,6 +2292,11 @@ conf/all/forwarding - BOOLEAN proxy_ndp - BOOLEAN Do proxy ndp. +force_forwarding - BOOLEAN + Enable forwarding on this interface only -- regardless of the setting on + ``conf/all/forwarding``. When setting ``conf.all.forwarding`` to 0, + the ``force_forwarding`` flag will be reset on all interfaces. + fwmark_reflect - BOOLEAN Controls the fwmark of kernel-generated IPv6 reply packets that are not associated with a socket for example, TCP RSTs or ICMPv6 echo replies). diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 5aeeed22f35b..d975a86f29be 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -17,6 +17,7 @@ struct ipv6_devconf { __s32 hop_limit; __s32 mtu6; __s32 forwarding; + __s32 force_forwarding; __s32 disable_policy; __s32 proxy_ndp; __cacheline_group_end(ipv6_devconf_read_txrx); diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index cf592d7b630f..d4d3ae774b26 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -199,6 +199,7 @@ enum { DEVCONF_NDISC_EVICT_NOCARRIER, DEVCONF_ACCEPT_UNTRACKED_NA, DEVCONF_ACCEPT_RA_MIN_LFT, + DEVCONF_FORCE_FORWARDING, DEVCONF_MAX }; diff --git a/include/uapi/linux/netconf.h b/include/uapi/linux/netconf.h index fac4edd55379..1c8c84d65ae3 100644 --- a/include/uapi/linux/netconf.h +++ b/include/uapi/linux/netconf.h @@ -19,6 +19,7 @@ enum { NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, NETCONFA_INPUT, NETCONFA_BC_FORWARDING, + NETCONFA_FORCE_FORWARDING, __NETCONFA_MAX }; #define NETCONFA_MAX (__NETCONFA_MAX - 1) diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h index 8981f00204db..63d1464cb71c 100644 --- a/include/uapi/linux/sysctl.h +++ b/include/uapi/linux/sysctl.h @@ -573,6 +573,7 @@ enum { NET_IPV6_ACCEPT_RA_FROM_LOCAL=26, NET_IPV6_ACCEPT_RA_RT_INFO_MIN_PLEN=27, NET_IPV6_RA_DEFRTR_METRIC=28, + NET_IPV6_FORCE_FORWARDING=29, __NET_IPV6_MAX }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index ba2ec7c870cc..92acf44febd1 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -239,6 +239,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .ndisc_evict_nocarrier = 1, .ra_honor_pio_life = 0, .ra_honor_pio_pflag = 0, + .force_forwarding = 0, }; static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { @@ -303,6 +304,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .ndisc_evict_nocarrier = 1, .ra_honor_pio_life = 0, .ra_honor_pio_pflag = 0, + .force_forwarding = 0, }; /* Check if link is ready: is it up and is a valid qdisc available */ @@ -857,6 +859,9 @@ static void addrconf_forward_change(struct net *net, __s32 newf) idev = __in6_dev_get_rtnl_net(dev); if (idev) { int changed = (!idev->cnf.forwarding) ^ (!newf); + /* Disabling all.forwarding sets 0 to force_forwarding for all interfaces */ + if (newf == 0) + WRITE_ONCE(idev->cnf.force_forwarding, newf); WRITE_ONCE(idev->cnf.forwarding, newf); if (changed) @@ -5719,6 +5724,7 @@ static void ipv6_store_devconf(const struct ipv6_devconf *cnf, array[DEVCONF_ACCEPT_UNTRACKED_NA] = READ_ONCE(cnf->accept_untracked_na); array[DEVCONF_ACCEPT_RA_MIN_LFT] = READ_ONCE(cnf->accept_ra_min_lft); + array[DEVCONF_FORCE_FORWARDING] = READ_ONCE(cnf->force_forwarding); } static inline size_t inet6_ifla6_size(void) @@ -6747,6 +6753,76 @@ static int addrconf_sysctl_disable_policy(const struct ctl_table *ctl, int write return ret; } +static void addrconf_force_forward_change(struct net *net, __s32 newf) +{ + struct net_device *dev; + struct inet6_dev *idev; + + for_each_netdev(net, dev) { + idev = __in6_dev_get_rtnl_net(dev); + if (idev) { + int changed = (!idev->cnf.force_forwarding) ^ (!newf); + + WRITE_ONCE(idev->cnf.force_forwarding, newf); + if (changed) { + inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, + NETCONFA_FORCE_FORWARDING, + dev->ifindex, &idev->cnf); + } + } + } +} + +static int addrconf_sysctl_force_forwarding(const struct ctl_table *ctl, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + struct inet6_dev *idev = ctl->extra1; + struct ctl_table tmp_ctl = *ctl; + struct net *net = ctl->extra2; + int *valp = ctl->data; + int new_val = *valp; + int old_val = *valp; + loff_t pos = *ppos; + int ret; + + tmp_ctl.extra1 = SYSCTL_ZERO; + tmp_ctl.extra2 = SYSCTL_ONE; + tmp_ctl.data = &new_val; + + ret = proc_douintvec_minmax(&tmp_ctl, write, buffer, lenp, ppos); + + if (write && old_val != new_val) { + if (!rtnl_net_trylock(net)) + return restart_syscall(); + + WRITE_ONCE(*valp, new_val); + + if (valp == &net->ipv6.devconf_dflt->force_forwarding) { + inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, + NETCONFA_FORCE_FORWARDING, + NETCONFA_IFINDEX_DEFAULT, + net->ipv6.devconf_dflt); + } else if (valp == &net->ipv6.devconf_all->force_forwarding) { + inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, + NETCONFA_FORCE_FORWARDING, + NETCONFA_IFINDEX_ALL, + net->ipv6.devconf_all); + + addrconf_force_forward_change(net, new_val); + } else { + inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, + NETCONFA_FORCE_FORWARDING, + idev->dev->ifindex, + &idev->cnf); + } + rtnl_net_unlock(net); + } + + if (ret) + *ppos = pos; + return ret; +} + static int minus_one = -1; static const int two_five_five = 255; static u32 ioam6_if_id_max = U16_MAX; @@ -7217,6 +7293,13 @@ static const struct ctl_table addrconf_sysctl[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, + { + .procname = "force_forwarding", + .data = &ipv6_devconf.force_forwarding, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = addrconf_sysctl_force_forwarding, + }, }; static int __addrconf_sysctl_register(struct net *net, char *dev_name, diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 7bd29a9ff0db..3853090d7282 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -509,7 +509,8 @@ int ip6_forward(struct sk_buff *skb) u32 mtu; idev = __in6_dev_get_safely(dev_get_by_index_rcu(net, IP6CB(skb)->iif)); - if (READ_ONCE(net->ipv6.devconf_all->forwarding) == 0) + if (!READ_ONCE(net->ipv6.devconf_all->forwarding) && + (!idev || !READ_ONCE(idev->cnf.force_forwarding))) goto error; if (skb->pkt_type != PACKET_HOST) diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 332f387615d7..f64ec8a15a77 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -112,6 +112,7 @@ TEST_PROGS += skf_net_off.sh TEST_GEN_FILES += skf_net_off TEST_GEN_FILES += tfo TEST_PROGS += tfo_passive.sh +TEST_PROGS += ipv6_force_forwarding.sh # YNL files, must be before "include ..lib.mk" YNL_GEN_FILES := busy_poller netlink-dumps diff --git a/tools/testing/selftests/net/ipv6_force_forwarding.sh b/tools/testing/selftests/net/ipv6_force_forwarding.sh new file mode 100644 index 000000000000..62adc9d4afc9 --- /dev/null +++ b/tools/testing/selftests/net/ipv6_force_forwarding.sh @@ -0,0 +1,105 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Test IPv6 force_forwarding interface property +# +# This test verifies that the force_forwarding property works correctly: +# - When global forwarding is disabled, packets are not forwarded normally +# - When force_forwarding is enabled on an interface, packets are forwarded +# regardless of the global forwarding setting + +source lib.sh + +cleanup() { + cleanup_ns $ns1 $ns2 $ns3 +} + +trap cleanup EXIT + +setup_test() { + # Create three namespaces: sender, router, receiver + setup_ns ns1 ns2 ns3 + + # Create veth pairs: ns1 <-> ns2 <-> ns3 + ip link add name veth12 type veth peer name veth21 + ip link add name veth23 type veth peer name veth32 + + # Move interfaces to namespaces + ip link set veth12 netns $ns1 + ip link set veth21 netns $ns2 + ip link set veth23 netns $ns2 + ip link set veth32 netns $ns3 + + # Configure interfaces + ip -n $ns1 addr add 2001:db8:1::1/64 dev veth12 + ip -n $ns2 addr add 2001:db8:1::2/64 dev veth21 + ip -n $ns2 addr add 2001:db8:2::1/64 dev veth23 + ip -n $ns3 addr add 2001:db8:2::2/64 dev veth32 + + # Bring up interfaces + ip -n $ns1 link set veth12 up + ip -n $ns2 link set veth21 up + ip -n $ns2 link set veth23 up + ip -n $ns3 link set veth32 up + + # Add routes + ip -n $ns1 route add 2001:db8:2::/64 via 2001:db8:1::2 + ip -n $ns3 route add 2001:db8:1::/64 via 2001:db8:2::1 + + # Disable global forwarding + ip netns exec $ns2 sysctl -qw net.ipv6.conf.all.forwarding=0 +} + +test_force_forwarding() { + local ret=0 + + echo "TEST: force_forwarding functionality" + + # Check if force_forwarding sysctl exists + if ! ip netns exec $ns2 test -f /proc/sys/net/ipv6/conf/veth21/force_forwarding; then + echo "SKIP: force_forwarding not available" + return $ksft_skip + fi + + # Test 1: Without force_forwarding, ping should fail + ip netns exec $ns2 sysctl -qw net.ipv6.conf.veth21.force_forwarding=0 + ip netns exec $ns2 sysctl -qw net.ipv6.conf.veth23.force_forwarding=0 + + if ip netns exec $ns1 ping -6 -c 1 -W 2 2001:db8:2::2 &>/dev/null; then + echo "FAIL: ping succeeded when forwarding disabled" + ret=1 + else + echo "PASS: forwarding disabled correctly" + fi + + # Test 2: With force_forwarding enabled, ping should succeed + ip netns exec $ns2 sysctl -qw net.ipv6.conf.veth21.force_forwarding=1 + ip netns exec $ns2 sysctl -qw net.ipv6.conf.veth23.force_forwarding=1 + + if ip netns exec $ns1 ping -6 -c 1 -W 2 2001:db8:2::2 &>/dev/null; then + echo "PASS: force_forwarding enabled forwarding" + else + echo "FAIL: ping failed with force_forwarding enabled" + ret=1 + fi + + return $ret +} + +echo "IPv6 force_forwarding test" +echo "==========================" + +setup_test +test_force_forwarding +ret=$? + +if [ $ret -eq 0 ]; then + echo "OK" + exit 0 +elif [ $ret -eq $ksft_skip ]; then + echo "SKIP" + exit $ksft_skip +else + echo "FAIL" + exit 1 +fi -- 2.39.5

1 week

3
3
0 0

[PATCH v4] selftests/mm: add process_madvise() tests

by wang lian

Add tests for process_madvise(), focusing on verifying behavior under various conditions including valid usage and error cases. Signed-off-by: wang lian <lianux.mm(a)gmail.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Suggested-by: David Hildenbrand <david(a)redhat.com> Suggested-by: Zi Yan <ziy(a)nvidia.com> Acked-by: SeongJae Park <sj(a)kernel.org> --- Changelog v4: - Refine resource cleanup logic in test teardown to be more robust. - Improve remote_collapse test to correctly handle different THP (Transparent Huge Page) policies ('always', 'madvise', 'never'), including handling race conditions with khugepaged. - Resolve build errors Changelog v3: https://lore.kernel.org/lkml/20250703044326.65061-1-lianux.mm@gmail.com/ - Rebased onto the latest mm-stable branch to ensure clean application. - Refactor common signal handling logic into vm_util to reduce code duplication. - Improve test robustness and diagnostics based on community feedback. - Address minor code style and script corrections. Changelog v2: https://lore.kernel.org/lkml/20250630140957.4000-1-lianux.mm@gmail.com/ - Drop MADV_DONTNEED tests based on feedback. - Focus solely on process_madvise() syscall. - Improve error handling and structure. - Add future-proof flag test. - Style and comment cleanups. -V1: https://lore.kernel.org/lkml/20250621133003.4733-1-lianux.mm@gmail.com/ tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/guard-regions.c | 51 --- tools/testing/selftests/mm/process_madv.c | 447 +++++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 5 + tools/testing/selftests/mm/vm_util.c | 35 ++ tools/testing/selftests/mm/vm_util.h | 22 + 7 files changed, 511 insertions(+), 51 deletions(-) create mode 100644 tools/testing/selftests/mm/process_madv.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index 824266982aa3..95bd9c6ead9e 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -25,6 +25,7 @@ pfnmap protection_keys protection_keys_32 protection_keys_64 +process_madv madv_populate uffd-stress uffd-unit-tests diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index ae6f994d3add..d13b3cef2a2b 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -85,6 +85,7 @@ TEST_GEN_FILES += mseal_test TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += pagemap_ioctl TEST_GEN_FILES += pfnmap +TEST_GEN_FILES += process_madv TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += uffd-stress diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c index 93af3d3760f9..4cf101b0fe5e 100644 --- a/tools/testing/selftests/mm/guard-regions.c +++ b/tools/testing/selftests/mm/guard-regions.c @@ -9,8 +9,6 @@ #include <linux/limits.h> #include <linux/userfaultfd.h> #include <linux/fs.h> -#include <setjmp.h> -#include <signal.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> @@ -24,24 +22,6 @@ #include "../pidfd/pidfd.h" -/* - * Ignore the checkpatch warning, as per the C99 standard, section 7.14.1.1: - * - * "If the signal occurs other than as the result of calling the abort or raise - * function, the behavior is undefined if the signal handler refers to any - * object with static storage duration other than by assigning a value to an - * object declared as volatile sig_atomic_t" - */ -static volatile sig_atomic_t signal_jump_set; -static sigjmp_buf signal_jmp_buf; - -/* - * Ignore the checkpatch warning, we must read from x but don't want to do - * anything with it in order to trigger a read page fault. We therefore must use - * volatile to stop the compiler from optimising this away. - */ -#define FORCE_READ(x) (*(volatile typeof(x) *)x) - /* * How is the test backing the mapping being tested? */ @@ -120,14 +100,6 @@ static int userfaultfd(int flags) return syscall(SYS_userfaultfd, flags); } -static void handle_fatal(int c) -{ - if (!signal_jump_set) - return; - - siglongjmp(signal_jmp_buf, c); -} - static ssize_t sys_process_madvise(int pidfd, const struct iovec *iovec, size_t n, int advice, unsigned int flags) { @@ -180,29 +152,6 @@ static bool try_read_write_buf(char *ptr) return try_read_buf(ptr) && try_write_buf(ptr); } -static void setup_sighandler(void) -{ - struct sigaction act = { - .sa_handler = &handle_fatal, - .sa_flags = SA_NODEFER, - }; - - sigemptyset(&act.sa_mask); - if (sigaction(SIGSEGV, &act, NULL)) - ksft_exit_fail_perror("sigaction"); -} - -static void teardown_sighandler(void) -{ - struct sigaction act = { - .sa_handler = SIG_DFL, - .sa_flags = SA_NODEFER, - }; - - sigemptyset(&act.sa_mask); - sigaction(SIGSEGV, &act, NULL); -} - static int open_file(const char *prefix, char *path) { int fd; diff --git a/tools/testing/selftests/mm/process_madv.c b/tools/testing/selftests/mm/process_madv.c new file mode 100644 index 000000000000..7d7509486d46 --- /dev/null +++ b/tools/testing/selftests/mm/process_madv.c @@ -0,0 +1,447 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#define _GNU_SOURCE +#include "../kselftest_harness.h" +#include <errno.h> +#include <setjmp.h> +#include <signal.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <linux/mman.h> +#include <sys/syscall.h> +#include <unistd.h> +#include <sched.h> +#include <linux/pidfd.h> +#include <linux/uio.h> +#include "vm_util.h" + +FIXTURE(process_madvise) +{ + int pidfd; + int flag; + pid_t child_pid; +}; + +FIXTURE_SETUP(process_madvise) +{ + self->pidfd = PIDFD_SELF; + self->flag = 0; + self->child_pid = -1; + setup_sighandler(); +}; + +FIXTURE_TEARDOWN_PARENT(process_madvise) +{ + teardown_sighandler(); + if (self->child_pid > 0) { + kill(self->child_pid, SIGKILL); + waitpid(self->child_pid, NULL, 0); + } +} + +static ssize_t sys_process_madvise(int pidfd, const struct iovec *iovec, + size_t vlen, int advice, unsigned int flags) +{ + return syscall(__NR_process_madvise, pidfd, iovec, vlen, advice, flags); +} + +/* + * Enable our signal catcher and try to read the specified buffer. The + * return value indicates whether the read succeeds without a fatal + * signal. + */ +static bool try_read_buf(char *ptr) +{ + bool failed; + + /* Tell signal handler to jump back here on fatal signal. */ + signal_jump_set = true; + /* If a fatal signal arose, we will jump back here and failed is set. */ + failed = sigsetjmp(signal_jmp_buf, 0) != 0; + + if (!failed) + FORCE_READ(ptr); + + signal_jump_set = false; + return !failed; +} + +TEST_F(process_madvise, basic) +{ + const unsigned long pagesize = (unsigned long)sysconf(_SC_PAGESIZE); + const int madvise_pages = 4; + char *map; + ssize_t ret; + struct iovec vec[madvise_pages]; + + /* + * Create a single large mapping. We will pick pages from this + * mapping to advise on. This ensures we test non-contiguous iovecs. + */ + map = mmap(NULL, pagesize * 10, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (map == MAP_FAILED) + ksft_exit_skip("mmap failed, not enough memory.\n"); + + /* Fill the entire region with a known pattern. */ + memset(map, 'A', pagesize * 10); + + /* + * Setup the iovec to point to 4 non-contiguous pages + * within the mapping. + */ + vec[0].iov_base = &map[0 * pagesize]; + vec[0].iov_len = pagesize; + vec[1].iov_base = &map[3 * pagesize]; + vec[1].iov_len = pagesize; + vec[2].iov_base = &map[5 * pagesize]; + vec[2].iov_len = pagesize; + vec[3].iov_base = &map[8 * pagesize]; + vec[3].iov_len = pagesize; + + ret = sys_process_madvise(PIDFD_SELF, vec, madvise_pages, MADV_DONTNEED, + 0); + if (ret == -1 && errno == EPERM) + ksft_exit_skip( + "process_madvise() unsupported or permission denied, try running as root.\n"); + else if (errno == EINVAL) + ksft_exit_skip( + "process_madvise() unsupported or parameter invalid, please check arguments.\n"); + + /* The call should succeed and report the total bytes processed. */ + ASSERT_EQ(ret, madvise_pages * pagesize); + + /* Check that advised pages are now zero. */ + for (int i = 0; i < madvise_pages; i++) { + char *advised_page = (char *)vec[i].iov_base; + + /* Access should be successful (kernel provides a new page). */ + ASSERT_TRUE(try_read_buf(advised_page)); + /* Content must be 0, not 'A'. */ + ASSERT_EQ(*advised_page, 0); + } + + /* Check that an un-advised page in between is still 'A'. */ + char *unadvised_page = &map[1 * pagesize]; + + ASSERT_TRUE(try_read_buf(unadvised_page)); + for (int i = 0; i < pagesize; i++) + ASSERT_EQ(unadvised_page[i], 'A'); + + /* Cleanup. */ + ASSERT_EQ(munmap(map, pagesize * 10), 0); +} + +static long get_smaps_anon_huge_pages(pid_t pid, void *addr) +{ + char smaps_path[64]; + char *line = NULL; + unsigned long start, end; + long anon_huge_kb; + size_t len; + FILE *f; + bool in_vma; + + in_vma = false; + snprintf(smaps_path, sizeof(smaps_path), "/proc/%d/smaps", pid); + f = fopen(smaps_path, "r"); + if (!f) + return -1; + + while (getline(&line, &len, f) != -1) { + /* Check if the line describes a VMA range */ + if (sscanf(line, "%lx-%lx", &start, &end) == 2) { + if ((unsigned long)addr >= start && + (unsigned long)addr < end) + in_vma = true; + else + in_vma = false; + continue; + } + + /* If we are in the correct VMA, look for the AnonHugePages field */ + if (in_vma && + sscanf(line, "AnonHugePages: %ld kB", &anon_huge_kb) == 1) + break; + } + + free(line); + fclose(f); + + return (anon_huge_kb > 0) ? (anon_huge_kb * 1024) : 0; +} + +static bool is_thp_always(void) +{ + const char *path = "/sys/kernel/mm/transparent_hugepage/enabled"; + char buf[32]; + FILE *f = fopen(path, "r"); + + if (!f) + return false; + + if (fgets(buf, sizeof(buf), f)) + if (strstr(buf, "[always]")) { + fclose(f); + return true; + } + + fclose(f); + return false; +} + +/** + * TEST_F(process_madvise, remote_collapse) + * + * This test deterministically validates process_madvise() with MADV_COLLAPSE + * on a remote process, other advices are difficult to verify reliably. + * + * The test verifies that a memory region in a child process, initially + * backed by small pages, can be collapsed into a Transparent Huge Page by a + * request from the parent. The result is verified by parsing the child's + * /proc/<pid>/smaps file. + */ +TEST_F(process_madvise, remote_collapse) +{ + const unsigned long pagesize = (unsigned long)sysconf(_SC_PAGESIZE); + int pidfd; + long huge_page_size; + int pipe_info[2]; + ssize_t ret; + struct iovec vec; + + struct child_info { + pid_t pid; + void *map_addr; + } info; + + huge_page_size = default_huge_page_size(); + if (huge_page_size <= 0) + ksft_exit_skip("Could not determine a valid huge page size.\n"); + + ASSERT_EQ(pipe(pipe_info), 0); + + self->child_pid = fork(); + ASSERT_NE(self->child_pid, -1); + + if (self->child_pid == 0) { + char *map; + size_t map_size = 2 * huge_page_size; + + close(pipe_info[0]); + + map = mmap(NULL, map_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(map, MAP_FAILED); + + /* Fault in as small pages */ + for (size_t i = 0; i < map_size; i += pagesize) + map[i] = 'A'; + + /* Send info and pause */ + info.pid = getpid(); + info.map_addr = map; + ret = write(pipe_info[1], &info, sizeof(info)); + ASSERT_EQ(ret, sizeof(info)); + close(pipe_info[1]); + + pause(); + exit(0); + } + + close(pipe_info[1]); + + /* Receive child info */ + ret = read(pipe_info[0], &info, sizeof(info)); + if (ret <= 0) { + waitpid(self->child_pid, NULL, 0); + ksft_exit_skip("Failed to read child info from pipe.\n"); + } + ASSERT_EQ(ret, sizeof(info)); + close(pipe_info[0]); + self->child_pid = info.pid; + + pidfd = syscall(__NR_pidfd_open, self->child_pid, 0); + ASSERT_GE(pidfd, 0); + + vec.iov_base = info.map_addr; + vec.iov_len = huge_page_size; + + if (is_thp_always()) { + long initial_huge_pages; + + /* + * When THP is 'always', khugepaged may pre-emptively + * collapse the pages before our MADV_COLLAPSE call. Check + * the initial state to provide a more accurate test report. + */ + initial_huge_pages = + get_smaps_anon_huge_pages(self->child_pid, info.map_addr); + + if (initial_huge_pages == 2 * huge_page_size) { + /* + * The pages were already collapsed by khugepaged. + * The test goal narrows to verifying that MADV_COLLAPSE + * correctly returns success on an already-collapsed + * region, as documented. + */ + ksft_test_result_skip( + "THP is 'always' and pages were pre-collapsed; verifying success on already-collapsed page.\n"); + + ret = sys_process_madvise(pidfd, &vec, 1, MADV_COLLAPSE, + 0); + ASSERT_EQ(ret, huge_page_size); + goto cleanup; + } + + /* + * Pages are still small, creating a race between our call + * and khugepaged. This is the main test scenario for 'always'. + */ + ret = sys_process_madvise(pidfd, &vec, 1, MADV_COLLAPSE, 0); + + if (ret == -1) { + /* + * MADV_COLLAPSE lost the race to khugepaged, which + * likely held a page lock. The kernel correctly + * reports this temporary contention with EAGAIN. + */ + if (errno == EAGAIN) { + ksft_test_result_skip( + "THP is 'always', process_madvise returned EAGAIN due to an expected race with khugepaged.\n"); + } else { + ksft_test_result_fail( + "process_madvise failed with unexpected errno %d in 'always' mode.\n", + errno); + } + goto cleanup; + } + + /* + * MADV_COLLAPSE won the race and successfully collapsed + * the pages. Verify the final state. + */ + ASSERT_EQ(ret, huge_page_size); + ASSERT_EQ(get_smaps_anon_huge_pages(self->child_pid, info.map_addr), + huge_page_size); + ksft_test_result_pass( + "THP is 'always', MADV_COLLAPSE won race and collapsed pages.\n"); + goto cleanup; + } + + /* + * THP is 'madvise' or 'never'. No race is expected with khugepaged. + * We can perform a straightforward state-change verification. + */ + ASSERT_EQ(get_smaps_anon_huge_pages(self->child_pid, info.map_addr), 0); + + ret = sys_process_madvise(pidfd, &vec, 1, MADV_COLLAPSE, 0); + if (ret == -1) { + if (errno == EINVAL) + ksft_exit_skip( + "PROCESS_MADV_ADVISE is not supported.\n"); + else if (errno == EPERM) + ksft_exit_skip( + "No process_madvise() permissions, try running as root.\n"); + goto cleanup; + } + ASSERT_EQ(ret, huge_page_size); + + ASSERT_EQ(get_smaps_anon_huge_pages(self->child_pid, info.map_addr), + huge_page_size); + + ksft_test_result_pass( + "MADV_COLLAPSE successfully verified via smaps.\n"); + +cleanup: + /* Cleanup */ + kill(self->child_pid, SIGKILL); + waitpid(self->child_pid, NULL, 0); + if (pidfd >= 0) + close(pidfd); +} + +/* + * Test process_madvise() with various invalid pidfds to ensure correct error + * handling. This includes negative fds, non-pidfd fds, and pidfds for + * processes that no longer exist. + */ +TEST_F(process_madvise, invalid_pidfd) +{ + struct iovec vec; + pid_t child_pid; + ssize_t ret; + int pidfd; + + vec.iov_base = (void *)0x1234; + vec.iov_len = 4096; + + /* Using an invalid fd number (-1) should fail with EBADF. */ + ret = sys_process_madvise(-1, &vec, 1, MADV_DONTNEED, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EBADF); + + /* + * Using a valid fd that is not a pidfd (e.g. stdin) should fail + * with EBADF. + */ + ret = sys_process_madvise(STDIN_FILENO, &vec, 1, MADV_DONTNEED, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EBADF); + + /* + * Using a pidfd for a process that has already exited should fail + * with ESRCH. + */ + child_pid = fork(); + ASSERT_NE(child_pid, -1); + + if (child_pid == 0) + exit(0); + + pidfd = syscall(__NR_pidfd_open, child_pid, 0); + ASSERT_GE(pidfd, 0); + + /* Wait for the child to ensure it has terminated. */ + waitpid(child_pid, NULL, 0); + + ret = sys_process_madvise(pidfd, &vec, 1, MADV_DONTNEED, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, ESRCH); + close(pidfd); +} + +/* + * Test process_madvise() with an invalid flag value. Now we only support flag=0 + * future we will use it support sync so reserve this test. + */ +TEST_F(process_madvise, flag) +{ + const unsigned long pagesize = (unsigned long)sysconf(_SC_PAGESIZE); + unsigned int invalid_flag; + struct iovec vec; + char *map; + ssize_t ret; + + map = mmap(NULL, pagesize, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, + 0); + if (map == MAP_FAILED) + ksft_exit_skip("mmap failed, not enough memory.\n"); + + vec.iov_base = map; + vec.iov_len = pagesize; + + invalid_flag = 0x80000000; + + ret = sys_process_madvise(PIDFD_SELF, &vec, 1, MADV_DONTNEED, + invalid_flag); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EINVAL); + + /* Cleanup. */ + ASSERT_EQ(munmap(map, pagesize), 0); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index dddd1dd8af14..84fb51902c3e 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -65,6 +65,8 @@ separated by spaces: test pagemap_scan IOCTL - pfnmap tests for VM_PFNMAP handling +- process_madv + test process_madvise - cow test copy-on-write semantics - thp @@ -422,6 +424,9 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke # MADV_GUARD_INSTALL and MADV_GUARD_REMOVE tests CATEGORY="madv_guard" run_test ./guard-regions +# PROCESS_MADVISE TEST +CATEGORY="process_madv" run_test ./process_madv + # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests CATEGORY="madv_populate" run_test ./madv_populate diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c index 5492e3f784df..85b209260e5a 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -20,6 +20,9 @@ unsigned int __page_size; unsigned int __page_shift; +volatile sig_atomic_t signal_jump_set; +sigjmp_buf signal_jmp_buf; + uint64_t pagemap_get_entry(int fd, char *start) { const unsigned long pfn = (unsigned long)start / getpagesize(); @@ -524,3 +527,35 @@ int read_sysfs(const char *file_path, unsigned long *val) return 0; } + +static void handle_fatal(int c) +{ + if (!signal_jump_set) + return; + + siglongjmp(signal_jmp_buf, c); +} + +void setup_sighandler(void) +{ + struct sigaction act = { + .sa_handler = &handle_fatal, + .sa_flags = SA_NODEFER, + }; + + sigemptyset(&act.sa_mask); + if (sigaction(SIGSEGV, &act, NULL)) + ksft_exit_fail_perror("sigaction in setup"); +} + +void teardown_sighandler(void) +{ + struct sigaction act = { + .sa_handler = SIG_DFL, + .sa_flags = SA_NODEFER, + }; + + sigemptyset(&act.sa_mask); + if (sigaction(SIGSEGV, &act, NULL)) + ksft_exit_fail_perror("sigaction in teardown"); +} diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h index b8136d12a0f8..6bc4177a2807 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -8,6 +8,8 @@ #include <unistd.h> /* _SC_PAGESIZE */ #include "../kselftest.h" #include <linux/fs.h> +#include <setjmp.h> +#include <signal.h> #define BIT_ULL(nr) (1ULL << (nr)) #define PM_SOFT_DIRTY BIT_ULL(55) @@ -61,6 +63,24 @@ static inline void skip_test_dodgy_fs(const char *op_name) ksft_test_result_skip("%s failed with ENOENT. Filesystem might be buggy (9pfs?)\n", op_name); } +/* + * Ignore the checkpatch warning, as per the C99 standard, section 7.14.1.1: + * + * "If the signal occurs other than as the result of calling the abort or raise + * function, the behavior is undefined if the signal handler refers to any + * object with static storage duration other than by assigning a value to an + * object declared as volatile sig_atomic_t" + */ +extern volatile sig_atomic_t signal_jump_set; +extern sigjmp_buf signal_jmp_buf; + +/* + * Ignore the checkpatch warning, we must read from x but don't want to do + * anything with it in order to trigger a read page fault. We therefore must use + * volatile to stop the compiler from optimising this away. + */ +#define FORCE_READ(x) (*(volatile typeof(x) *)x) + uint64_t pagemap_get_entry(int fd, char *start); bool pagemap_is_softdirty(int fd, char *start); bool pagemap_is_swapped(int fd, char *start); @@ -90,6 +110,8 @@ bool find_vma_procmap(struct procmap_fd *procmap, void *address); int close_procmap(struct procmap_fd *procmap); int write_sysfs(const char *file_path, unsigned long val); int read_sysfs(const char *file_path, unsigned long *val); +void setup_sighandler(void); +void teardown_sighandler(void); static inline int open_self_procmap(struct procmap_fd *procmap_out) { -- 2.43.0

1 week

4
11
0 0

[PATCH v3] selftests/mm: add process_madvise() tests

by wang lian

Add tests for process_madvise(), focusing on verifying behavior under various conditions including valid usage and error cases. Signed-off-by: wang lian <lianux.mm(a)gmail.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Suggested-by: David Hildenbrand <david(a)redhat.com> Acked-by: SeongJae Park <sj(a)kernel.org> --- Changelog v3: - Rebased onto the latest mm-stable branch to ensure clean application. - Refactor common signal handling logic into vm_util to reduce code duplication. - Improve test robustness and diagnostics based on community feedback. - Address minor code style and script corrections. Changelog v2: - Drop MADV_DONTNEED tests based on feedback. - Focus solely on process_madvise() syscall. - Improve error handling and structure. - Add future-proof flag test. - Style and comment cleanups. tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/guard-regions.c | 51 --- tools/testing/selftests/mm/process_madv.c | 358 +++++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 5 + tools/testing/selftests/mm/vm_util.c | 35 ++ tools/testing/selftests/mm/vm_util.h | 22 ++ 7 files changed, 422 insertions(+), 51 deletions(-) create mode 100644 tools/testing/selftests/mm/process_madv.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index 824266982aa3..95bd9c6ead9e 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -25,6 +25,7 @@ pfnmap protection_keys protection_keys_32 protection_keys_64 +process_madv madv_populate uffd-stress uffd-unit-tests diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index ae6f994d3add..d13b3cef2a2b 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -85,6 +85,7 @@ TEST_GEN_FILES += mseal_test TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += pagemap_ioctl TEST_GEN_FILES += pfnmap +TEST_GEN_FILES += process_madv TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += uffd-stress diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c index 93af3d3760f9..4cf101b0fe5e 100644 --- a/tools/testing/selftests/mm/guard-regions.c +++ b/tools/testing/selftests/mm/guard-regions.c @@ -9,8 +9,6 @@ #include <linux/limits.h> #include <linux/userfaultfd.h> #include <linux/fs.h> -#include <setjmp.h> -#include <signal.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> @@ -24,24 +22,6 @@ #include "../pidfd/pidfd.h" -/* - * Ignore the checkpatch warning, as per the C99 standard, section 7.14.1.1: - * - * "If the signal occurs other than as the result of calling the abort or raise - * function, the behavior is undefined if the signal handler refers to any - * object with static storage duration other than by assigning a value to an - * object declared as volatile sig_atomic_t" - */ -static volatile sig_atomic_t signal_jump_set; -static sigjmp_buf signal_jmp_buf; - -/* - * Ignore the checkpatch warning, we must read from x but don't want to do - * anything with it in order to trigger a read page fault. We therefore must use - * volatile to stop the compiler from optimising this away. - */ -#define FORCE_READ(x) (*(volatile typeof(x) *)x) - /* * How is the test backing the mapping being tested? */ @@ -120,14 +100,6 @@ static int userfaultfd(int flags) return syscall(SYS_userfaultfd, flags); } -static void handle_fatal(int c) -{ - if (!signal_jump_set) - return; - - siglongjmp(signal_jmp_buf, c); -} - static ssize_t sys_process_madvise(int pidfd, const struct iovec *iovec, size_t n, int advice, unsigned int flags) { @@ -180,29 +152,6 @@ static bool try_read_write_buf(char *ptr) return try_read_buf(ptr) && try_write_buf(ptr); } -static void setup_sighandler(void) -{ - struct sigaction act = { - .sa_handler = &handle_fatal, - .sa_flags = SA_NODEFER, - }; - - sigemptyset(&act.sa_mask); - if (sigaction(SIGSEGV, &act, NULL)) - ksft_exit_fail_perror("sigaction"); -} - -static void teardown_sighandler(void) -{ - struct sigaction act = { - .sa_handler = SIG_DFL, - .sa_flags = SA_NODEFER, - }; - - sigemptyset(&act.sa_mask); - sigaction(SIGSEGV, &act, NULL); -} - static int open_file(const char *prefix, char *path) { int fd; diff --git a/tools/testing/selftests/mm/process_madv.c b/tools/testing/selftests/mm/process_madv.c new file mode 100644 index 000000000000..3d26105b4781 --- /dev/null +++ b/tools/testing/selftests/mm/process_madv.c @@ -0,0 +1,358 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#define _GNU_SOURCE +#include "../kselftest_harness.h" +#include <errno.h> +#include <setjmp.h> +#include <signal.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mman.h> +#include <sys/syscall.h> +#include <unistd.h> +#include <sched.h> +#include <sys/pidfd.h> +#include "vm_util.h" + +#include "../pidfd/pidfd.h" + +FIXTURE(process_madvise) +{ + int pidfd; + int flag; +}; + +FIXTURE_SETUP(process_madvise) +{ + self->pidfd = PIDFD_SELF; + self->flag = 0; + setup_sighandler(); +}; + +FIXTURE_TEARDOWN(process_madvise) +{ + teardown_sighandler(); +} + +static ssize_t sys_process_madvise(int pidfd, const struct iovec *iovec, + size_t vlen, int advice, unsigned int flags) +{ + return syscall(__NR_process_madvise, pidfd, iovec, vlen, advice, flags); +} + +/* + * Enable our signal catcher and try to read the specified buffer. The + * return value indicates whether the read succeeds without a fatal + * signal. + */ +static bool try_read_buf(char *ptr) +{ + bool failed; + + /* Tell signal handler to jump back here on fatal signal. */ + signal_jump_set = true; + /* If a fatal signal arose, we will jump back here and failed is set. */ + failed = sigsetjmp(signal_jmp_buf, 0) != 0; + + if (!failed) + FORCE_READ(ptr); + + signal_jump_set = false; + return !failed; +} + +TEST_F(process_madvise, basic) +{ + const unsigned long pagesize = (unsigned long)sysconf(_SC_PAGESIZE); + const int madvise_pages = 4; + char *map; + ssize_t ret; + struct iovec vec[madvise_pages]; + + /* + * Create a single large mapping. We will pick pages from this + * mapping to advise on. This ensures we test non-contiguous iovecs. + */ + map = mmap(NULL, pagesize * 10, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (map == MAP_FAILED) + ksft_exit_skip("mmap failed, not enough memory.\n"); + + /* Fill the entire region with a known pattern. */ + memset(map, 'A', pagesize * 10); + + /* + * Setup the iovec to point to 4 non-contiguous pages + * within the mapping. + */ + vec[0].iov_base = &map[0 * pagesize]; + vec[0].iov_len = pagesize; + vec[1].iov_base = &map[3 * pagesize]; + vec[1].iov_len = pagesize; + vec[2].iov_base = &map[5 * pagesize]; + vec[2].iov_len = pagesize; + vec[3].iov_base = &map[8 * pagesize]; + vec[3].iov_len = pagesize; + + ret = sys_process_madvise(PIDFD_SELF, vec, madvise_pages, MADV_DONTNEED, + 0); + if (ret == -1 && errno == EPERM) + ksft_exit_skip( + "process_madvise() unsupported or permission denied, try running as root.\n"); + else if (errno == EINVAL) + ksft_exit_skip( + "process_madvise() unsupported or parameter invalid, please check arguments.\n"); + + /* The call should succeed and report the total bytes processed. */ + ASSERT_EQ(ret, madvise_pages * pagesize); + + /* Check that advised pages are now zero. */ + for (int i = 0; i < madvise_pages; i++) { + char *advised_page = (char *)vec[i].iov_base; + + /* Access should be successful (kernel provides a new page). */ + ASSERT_TRUE(try_read_buf(advised_page)); + /* Content must be 0, not 'A'. */ + ASSERT_EQ(*advised_page, 0); + } + + /* Check that an un-advised page in between is still 'A'. */ + char *unadvised_page = &map[1 * pagesize]; + + ASSERT_TRUE(try_read_buf(unadvised_page)); + for (int i = 0; i < pagesize; i++) + ASSERT_EQ(unadvised_page[i], 'A'); + + /* Cleanup. */ + ASSERT_EQ(munmap(map, pagesize * 10), 0); +} + +static long get_smaps_anon_huge_pages(pid_t pid, void *addr) +{ + char smaps_path[64]; + char *line = NULL; + unsigned long start, end; + long anon_huge_kb; + size_t len; + FILE *f; + bool in_vma; + + in_vma = false; + snprintf(smaps_path, sizeof(smaps_path), "/proc/%d/smaps", pid); + f = fopen(smaps_path, "r"); + if (!f) + return -1; + + while (getline(&line, &len, f) != -1) { + /* Check if the line describes a VMA range */ + if (sscanf(line, "%lx-%lx", &start, &end) == 2) { + if ((unsigned long)addr >= start && + (unsigned long)addr < end) + in_vma = true; + else + in_vma = false; + continue; + } + + /* If we are in the correct VMA, look for the AnonHugePages field */ + if (in_vma && + sscanf(line, "AnonHugePages: %ld kB", &anon_huge_kb) == 1) + break; + } + + free(line); + fclose(f); + + return (anon_huge_kb > 0) ? (anon_huge_kb * 1024) : 0; +} + +/** + * TEST_F(process_madvise, remote_collapse) + * + * This test deterministically validates process_madvise() with MADV_COLLAPSE + * on a remote process, other advices are difficult to verify reliably. + * + * The test verifies that a memory region in a child process, initially + * backed by small pages, can be collapsed into a Transparent Huge Page by a + * request from the parent. The result is verified by parsing the child's + * /proc/<pid>/smaps file. + */ +TEST_F(process_madvise, remote_collapse) +{ + const unsigned long pagesize = (unsigned long)sysconf(_SC_PAGESIZE); + pid_t child_pid; + int pidfd; + long huge_page_size; + int pipe_info[2]; + ssize_t ret; + struct iovec vec; + + struct child_info { + pid_t pid; + void *map_addr; + } info; + + huge_page_size = default_huge_page_size(); + if (huge_page_size <= 0) + ksft_exit_skip("Could not determine a valid huge page size.\n"); + + ASSERT_EQ(pipe(pipe_info), 0); + + child_pid = fork(); + ASSERT_NE(child_pid, -1); + + if (child_pid == 0) { + char *map; + size_t map_size = 2 * huge_page_size; + + close(pipe_info[0]); + + map = mmap(NULL, map_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(map, MAP_FAILED); + + /* Fault in as small pages */ + for (size_t i = 0; i < map_size; i += pagesize) + map[i] = 'A'; + + /* Send info and pause */ + info.pid = getpid(); + info.map_addr = map; + ret = write(pipe_info[1], &info, sizeof(info)); + ASSERT_EQ(ret, sizeof(info)); + close(pipe_info[1]); + + pause(); + exit(0); + } + + close(pipe_info[1]); + + /* Receive child info */ + ret = read(pipe_info[0], &info, sizeof(info)); + if (ret <= 0) { + waitpid(child_pid, NULL, 0); + ksft_exit_skip("Failed to read child info from pipe.\n"); + } + ASSERT_EQ(ret, sizeof(info)); + close(pipe_info[0]); + child_pid = info.pid; + + pidfd = pidfd_open(child_pid, 0); + ASSERT_GE(pidfd, 0); + + /* Baseline Check from Parent's perspective */ + ASSERT_EQ(get_smaps_anon_huge_pages(child_pid, info.map_addr), 0); + + vec.iov_base = info.map_addr; + vec.iov_len = huge_page_size; + ret = sys_process_madvise(pidfd, &vec, 1, MADV_COLLAPSE, 0); + if (ret == -1) { + if (errno == EINVAL) + ksft_exit_skip( + "PROCESS_MADV_ADVISE is not supported.\n"); + else if (errno == EPERM) + ksft_exit_skip( + "No process_madvise() permissions, try running as root.\n"); + goto cleanup; + } + ASSERT_EQ(ret, huge_page_size); + + ASSERT_EQ(get_smaps_anon_huge_pages(child_pid, info.map_addr), + huge_page_size); + + ksft_test_result_pass( + "MADV_COLLAPSE successfully verified via smaps.\n"); + +cleanup: + /* Cleanup */ + kill(child_pid, SIGKILL); + waitpid(child_pid, NULL, 0); + if (pidfd >= 0) + close(pidfd); +} + +/* + * Test process_madvise() with various invalid pidfds to ensure correct error + * handling. This includes negative fds, non-pidfd fds, and pidfds for + * processes that no longer exist. + */ +TEST_F(process_madvise, invalid_pidfd) +{ + struct iovec vec; + pid_t child_pid; + ssize_t ret; + int pidfd; + + vec.iov_base = (void *)0x1234; + vec.iov_len = 4096; + + /* Using an invalid fd number (-1) should fail with EBADF. */ + ret = sys_process_madvise(-1, &vec, 1, MADV_DONTNEED, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EBADF); + + /* + * Using a valid fd that is not a pidfd (e.g. stdin) should fail + * with EBADF. + */ + ret = sys_process_madvise(STDIN_FILENO, &vec, 1, MADV_DONTNEED, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EBADF); + + /* + * Using a pidfd for a process that has already exited should fail + * with ESRCH. + */ + child_pid = fork(); + ASSERT_NE(child_pid, -1); + + if (child_pid == 0) + exit(0); + + pidfd = pidfd_open(child_pid, 0); + ASSERT_GE(pidfd, 0); + + /* Wait for the child to ensure it has terminated. */ + waitpid(child_pid, NULL, 0); + + ret = sys_process_madvise(pidfd, &vec, 1, MADV_DONTNEED, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, ESRCH); + close(pidfd); +} + +/* + * Test process_madvise() with an invalid flag value. Now we only support flag=0 + * future we will use it support sync so reserve this test. + */ +TEST_F(process_madvise, flag) +{ + const unsigned long pagesize = (unsigned long)sysconf(_SC_PAGESIZE); + unsigned int invalid_flag; + struct iovec vec; + char *map; + ssize_t ret; + + map = mmap(NULL, pagesize, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, + 0); + if (map == MAP_FAILED) + ksft_exit_skip("mmap failed, not enough memory.\n"); + + vec.iov_base = map; + vec.iov_len = pagesize; + + invalid_flag = 0x80000000; + + ret = sys_process_madvise(PIDFD_SELF, &vec, 1, MADV_DONTNEED, + invalid_flag); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EINVAL); + + /* Cleanup. */ + ASSERT_EQ(munmap(map, pagesize), 0); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index dddd1dd8af14..84fb51902c3e 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -65,6 +65,8 @@ separated by spaces: test pagemap_scan IOCTL - pfnmap tests for VM_PFNMAP handling +- process_madv + test process_madvise - cow test copy-on-write semantics - thp @@ -422,6 +424,9 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke # MADV_GUARD_INSTALL and MADV_GUARD_REMOVE tests CATEGORY="madv_guard" run_test ./guard-regions +# PROCESS_MADVISE TEST +CATEGORY="process_madv" run_test ./process_madv + # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests CATEGORY="madv_populate" run_test ./madv_populate diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c index 5492e3f784df..85b209260e5a 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -20,6 +20,9 @@ unsigned int __page_size; unsigned int __page_shift; +volatile sig_atomic_t signal_jump_set; +sigjmp_buf signal_jmp_buf; + uint64_t pagemap_get_entry(int fd, char *start) { const unsigned long pfn = (unsigned long)start / getpagesize(); @@ -524,3 +527,35 @@ int read_sysfs(const char *file_path, unsigned long *val) return 0; } + +static void handle_fatal(int c) +{ + if (!signal_jump_set) + return; + + siglongjmp(signal_jmp_buf, c); +} + +void setup_sighandler(void) +{ + struct sigaction act = { + .sa_handler = &handle_fatal, + .sa_flags = SA_NODEFER, + }; + + sigemptyset(&act.sa_mask); + if (sigaction(SIGSEGV, &act, NULL)) + ksft_exit_fail_perror("sigaction in setup"); +} + +void teardown_sighandler(void) +{ + struct sigaction act = { + .sa_handler = SIG_DFL, + .sa_flags = SA_NODEFER, + }; + + sigemptyset(&act.sa_mask); + if (sigaction(SIGSEGV, &act, NULL)) + ksft_exit_fail_perror("sigaction in teardown"); +} diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h index b8136d12a0f8..6bc4177a2807 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -8,6 +8,8 @@ #include <unistd.h> /* _SC_PAGESIZE */ #include "../kselftest.h" #include <linux/fs.h> +#include <setjmp.h> +#include <signal.h> #define BIT_ULL(nr) (1ULL << (nr)) #define PM_SOFT_DIRTY BIT_ULL(55) @@ -61,6 +63,24 @@ static inline void skip_test_dodgy_fs(const char *op_name) ksft_test_result_skip("%s failed with ENOENT. Filesystem might be buggy (9pfs?)\n", op_name); } +/* + * Ignore the checkpatch warning, as per the C99 standard, section 7.14.1.1: + * + * "If the signal occurs other than as the result of calling the abort or raise + * function, the behavior is undefined if the signal handler refers to any + * object with static storage duration other than by assigning a value to an + * object declared as volatile sig_atomic_t" + */ +extern volatile sig_atomic_t signal_jump_set; +extern sigjmp_buf signal_jmp_buf; + +/* + * Ignore the checkpatch warning, we must read from x but don't want to do + * anything with it in order to trigger a read page fault. We therefore must use + * volatile to stop the compiler from optimising this away. + */ +#define FORCE_READ(x) (*(volatile typeof(x) *)x) + uint64_t pagemap_get_entry(int fd, char *start); bool pagemap_is_softdirty(int fd, char *start); bool pagemap_is_swapped(int fd, char *start); @@ -90,6 +110,8 @@ bool find_vma_procmap(struct procmap_fd *procmap, void *address); int close_procmap(struct procmap_fd *procmap); int write_sysfs(const char *file_path, unsigned long val); int read_sysfs(const char *file_path, unsigned long *val); +void setup_sighandler(void); +void teardown_sighandler(void); static inline int open_self_procmap(struct procmap_fd *procmap_out) { -- 2.43.0

1 week

5
13
0 0

[PATCH bpf-next 0/3] bpf: Show precise rejected function when attaching to __noreturn and __btf_id functions

by KaFai Wan

Show precise rejected function name when attaching to __noreturn and __btf_id functions. Add selftest for attaching tracing to __btf_id functions. --- KaFai Wan (3): bpf: Show precise rejected function when attaching to __noreturn functions bpf: Show precise rejected function when attaching to __btf_id functions selftests/bpf: Add selftest for attaching tracing to __btf_id functions kernel/bpf/verifier.c | 5 ++++- .../selftests/bpf/prog_tests/tracing_btf_ids.c | 16 ++++++++++++++++ .../selftests/bpf/progs/fexit_noreturns.c | 2 +- .../selftests/bpf/progs/tracing_btf_ids.c | 15 +++++++++++++++ 4 files changed, 36 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_btf_ids.c create mode 100644 tools/testing/selftests/bpf/progs/tracing_btf_ids.c -- 2.43.0

1 week

3
7
0 0

[PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs

by Lorenzo Stoakes

Historically we've made it a uAPI requirement that mremap() may only operate on a single VMA at a time. For instances where VMAs need to be resized, this makes sense, as it becomes very difficult to determine what a user actually wants should they indicate a desire to expand or shrink the size of multiple VMAs (truncate? Adjust sizes individually? Some other strategy?). However, in instances where a user is moving VMAs, it is restrictive to disallow this. This is especially the case when anonymous mapping remap may or may not be mergeable depending on whether VMAs have or have not been faulted due to anon_vma assignment and folio index alignment with vma->vm_pgoff. Often this can result in surprising impact where a moved region is faulted, then moved back and a user fails to observe a merge from otherwise compatible, adjacent VMAs. This change allows such cases to work without the user having to be cognizant of whether a prior mremap() move or other VMA operations has resulted in VMA fragmentation. In order to do this, this series performs a large amount of refactoring, most pertinently - grouping sanity checks together, separately those that check input parameters and those relating to VMAs. we also simplify the post-mmap lock drop processing for uffd and mlock()'d VMAs. With this done, we can then fairly straightforwardly implement this functionality. This works exclusively for mremap() invocations which specify MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the notification of the userland fault handler would require us to drop the mmap lock. The input and output addresses ranges must not overlap. We carefully account for moves which would result in VMA merges or would otherwise result in VMA iterator invalidation. Lorenzo Stoakes (10): mm/mremap: perform some simple cleanups mm/mremap: refactor initial parameter sanity checks mm/mremap: put VMA check and prep logic into helper function mm/mremap: cleanup post-processing stage of mremap mm/mremap: use an explicit uffd failure path for mremap mm/mremap: check remap conditions earlier mm/mremap: move remap_is_valid() into check_prep_vma() mm/mremap: clean up mlock populate behaviour mm/mremap: permit mremap() move of multiple VMAs tools/testing/selftests: extend mremap_test to test multi-VMA mremap fs/userfaultfd.c | 15 +- include/linux/userfaultfd_k.h | 1 + mm/mremap.c | 502 ++++++++++++++--------- tools/testing/selftests/mm/mremap_test.c | 145 ++++++- 4 files changed, 462 insertions(+), 201 deletions(-) -- 2.50.0

1 week

6
30
0 0

[PATCH v21 0/9] PCI: EP: Add RC-to-EP doorbell with platform MSI controller

by Frank Li via B4 Relay

┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐ │ │ │ │ │ │ │ │ │ PCI Endpoint │ │ PCI Host │ │ │ │ │ │ │ │ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │ │ │ │ │ │ │ │ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │ │ Controller │ │ update doorbell register address│ │ │ │ │ │ for BAR │ │ │ │ │ │ │ │ 3. Write BAR<n>│ │ │◄──┼───────────────────────────────────┼───┤ │ │ │ │ │ │ │ │ ├──►│ 4.Irq Handle │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └────────────┘ └───────────────────────────────────┘ └────────────────┘ This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/ Original patch only target to vntb driver. But actually it is common method. This patches add new API to pci-epf-core, so any EP driver can use it. Previous v2 discussion here. https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/ Changes in v21: - Align to bar size, try to fix Niklas reported problem. - Rebase to v6.16-rc5 - Link to v20: https://lore.kernel.org/r/20250709-ep-msi-v20-0-43d56f9bd54a@nxp.com Changes in v20: - remove set epf of_node's patch and only support one epf now. - move imx6's patch to first - detail change see each patches' change log - Link to v19: https://lore.kernel.org/r/20250609-ep-msi-v19-0-77362eaa48fa@nxp.com Changes in v19: - irq part already in v6.16-rc1, only missed pcie/dts part - rebase to v6.16-rc1 - update commit message for patch IMMUTABLE check. - Link to v18: https://lore.kernel.org/r/20250414-ep-msi-v18-0-f69b49917464@nxp.com Changes in v18: - pci-ep.yaml: sort property order, fix maxvalue to 0x7ffff for msi-map-mask and iommu-map-mask - Link to v17: https://lore.kernel.org/r/20250407-ep-msi-v17-0-633ab45a31d0@nxp.com Changes in v17: - move document part to pci-ep.yaml - Link to v16: https://lore.kernel.org/r/20250404-ep-msi-v16-0-d4919d68c0d0@nxp.com Changes in v16: - remove arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file because there are better patches, which under review. - Add document for pcie-ep msi-map usage - other change to see each patch's change log About IMMUTABLE (No change for this part, tglx provide feedback) > - This IMMUTABLE thing serves no purpose, because you don't randomly > plug this end-point block on any MSI controller. They come as part > of an SoC. "Yes and no. The problem is that the EP implementation is meant to be a generic library and while GIC-ITS guarantees immutability of the address/data pair after setup, there are architectures (x86, loongson, riscv) where the base MSI controller does not and immutability is only achieved when interrupt remapping is enabled. The latter can be disabled at boot-time and then the EP implementation becomes a lottery across affinity changes. That was my concern about this library implementation and that's why I asked for a mechanism to ensure that the underlying irqdomain provides a immutable address/data pair. So it does not matter for GIC-ITS, but in the larger picture it matters. Thanks, tglx " So it does not matter for GIC-ITS, but in the larger picture it matters. - Link to v15: https://lore.kernel.org/r/20250211-ep-msi-v15-0-bcacc1f2b1a9@nxp.com Changes in v15: - rebase to v6.14-rc1 - fix build issue find by kernel test robot - Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com Changes in v14: Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As a result, the approach has been reverted to the v9 method. However, there are several improvements: MSI now supports msi-map in addition to msi-parent. - The struct device: id is used as the endpoint function (EPF) device identity to map to the stream ID (sideband information). - The EPC device tree source (DTS) utilizes msi-map to provide such information. - The EPF device's of_node is set to the EPC controller’s node. This approach is commonly used for multi-function device (MFD) platform child devices, allowing them to inherit properties from the MFD device’s DTS, such as reset-cells and gpio-cells. This method is well-suited for the current case, as the EPF is inherently created/binded to the EPC and should inherit the EPC’s DTS node properties. Additionally: Since the basic IMX95 LUT support has already been merged into the mainline, a DTS and driver increment patch is added to complete the solution. The patch is rebased onto the latest linux-next tree and aligned with the new pcitest framework. - Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com Changes in v13: - Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI - Change request id as func | vfunc << 3 - Remove IRQ_DOMAIN_MSI_IMMUTABLE Thomas Gleixner: I hope capture all your points in review comments. If missed, let me know. - Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com Changes in v12: - Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function irq_domain_msi_is_immuatble(). - split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches - Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com Changes in v11: - Change to use MSI_FLAG_MSG_IMMUTABLE - Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com Changes in v10: Thomas Gleixner: There are big change in pci-ep-msi.c. I am sure if go on the corrent path. The key improvement is remove only 1 function devices's limitation. I use new patch for imutable check, which relative additional feature compared to base enablement patch. - Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info - Remove only support 1 endpoint function limiation. - Create one MSI domain for each endpoint function devices. - Use "msi-map" in pci ep controler node, instead of of msi-parent. first argument is (func_no << 8 | vfunc_no) - Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com Changes in v9 - Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering - Remove API pci_epf_align_inbound_addr_lo_hi - Move doorbell_alloc in to doorbell_enable function. - Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com Changes in v8: - update helper function name to pci_epf_align_inbound_addr() - Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com Changes in v7: - Add helper function pci_epf_align_addr(); - Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com Changes in v6: - change doorbell_addr to doorbell_offset - use round_down() - add Niklas's test by tag - rebase to pci/endpoint - Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com Changes in v5: - Move request_irq to epf test function driver for more flexiable user case - Add fixed size bar handler - Some minor improvememtn to see each patches's changelog. - Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com Changes in v4: - Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs() - Use new method to avoid compatible problem. Add new command DOORBELL_ENABLE and DOORBELL_DISABLE. pcitest -B send DOORBELL_ENABLE first, EP test function driver try to remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old driver don't support new command, so failure return, not side effect. After test, DOORBELL_DISABLE command send out to recover original map, so pcitest bar test can pass as normal. - Other detail change see each patches's change log - Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com Change from v2 to v3 - Fixed manivannan's comments - Move common part to pci-ep-msi.c and pci-ep-msi.h - rebase to 6.12-rc1 - use RevID to distingiush old version mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid ^^^^^^ to enable platform msi support. ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep - use new device ID, which identify support doorbell to avoid broken compatility. Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices keep the same behavior as before. EP side RC with old driver RC with new driver PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled Other device ID doorbell disabled* doorbell disabled* * Behavior remains unchanged. Change from v1 to v2 - Add missed patch for endpont/pci-epf-test.c - Move alloc and free to epc driver from epf. - Provide general help function for EPC driver to alloc platform msi irq. - Fixed manivannan's comments. Signed-off-by: Frank Li <Frank.Li(a)nxp.com> --- Frank Li (9): PCI: imx6: Add helper function imx_pcie_add_lut_by_rid() PCI: imx6: Add LUT configuration for MSI/IOMMU in Endpoint mode PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment PCI: endpoint: pci-epf-test: Add doorbell test support misc: pci_endpoint_test: Add doorbell test case selftests: pci_endpoint: Add doorbell test case arm64: dts: imx95: Add msi-map for pci-ep device Documentation/PCI/endpoint/pci-test-howto.rst | 14 +++ arch/arm64/boot/dts/freescale/imx95.dtsi | 1 + drivers/misc/pci_endpoint_test.c | 85 ++++++++++++- drivers/pci/controller/dwc/pci-imx6.c | 25 ++-- drivers/pci/endpoint/Kconfig | 8 ++ drivers/pci/endpoint/Makefile | 1 + drivers/pci/endpoint/functions/pci-epf-test.c | 136 +++++++++++++++++++++ drivers/pci/endpoint/pci-ep-msi.c | 98 +++++++++++++++ drivers/pci/endpoint/pci-epf-core.c | 36 ++++++ include/linux/pci-ep-msi.h | 28 +++++ include/linux/pci-epf.h | 18 +++ include/uapi/linux/pcitest.h | 1 + .../selftests/pci_endpoint/pci_endpoint_test.c | 28 +++++ 13 files changed, 470 insertions(+), 9 deletions(-) --- base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a change-id: 20241010-ep-msi-8b4cab33b1be Best regards, -- Frank Li <Frank.Li(a)nxp.com>

1 week

2
10
0 0

[PATCH] kunit: Enable PCI on UML without triggering WARN()

by Thomas Weißschuh

Various KUnit tests require PCI infrastructure to work. All normal platforms enable PCI by default, but UML does not. Enabling PCI from .kunitconfig files is problematic as it would not be portable. So in commit 6fc3a8636a7b ("kunit: tool: Enable virtio/PCI by default on UML") PCI was enabled by way of CONFIG_UML_PCI_OVER_VIRTIO=y. However CONFIG_UML_PCI_OVER_VIRTIO requires additional configuration of CONFIG_UML_PCI_OVER_VIRTIO_DEVICE_ID or will otherwise trigger a WARN() in virtio_pcidev_init(). However there is no one correct value for UML_PCI_OVER_VIRTIO_DEVICE_ID which could be used by default. This warning is confusing when debugging test failures. On the other hand, the functionality of CONFIG_UML_PCI_OVER_VIRTIO is not used at all, given that it is completely non-functional as indicated by the WARN() in question. Instead it is only used as a way to enable CONFIG_UML_PCI which itself is not directly configurable. Instead of going through CONFIG_UML_PCI_OVER_VIRTIO, introduce a custom configuration option which enables CONFIG_UML_PCI without triggering warnings or building dead code. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- lib/kunit/Kconfig | 7 +++++++ tools/testing/kunit/configs/arch_uml.config | 5 ++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig index a97897edd9642f3e5df7fdd9dee26ee5cf00d6a4..c8ca155521b2455a221ddbec3f6fc55662c83475 100644 --- a/lib/kunit/Kconfig +++ b/lib/kunit/Kconfig @@ -93,4 +93,11 @@ config KUNIT_AUTORUN_ENABLED In most cases this should be left as Y. Only if additional opt-in behavior is needed should this be set to N. +config KUNIT_UML_PCI + bool "KUnit UML PCI Support" + depends on UML + select UML_PCI + help + Enables the PCI subsystem on UML for use by KUnit tests. + endif # KUNIT diff --git a/tools/testing/kunit/configs/arch_uml.config b/tools/testing/kunit/configs/arch_uml.config index 54ad8972681a2cc724e6122b19407188910b9025..28edf816aa70e6f408d9486efff8898df79ee090 100644 --- a/tools/testing/kunit/configs/arch_uml.config +++ b/tools/testing/kunit/configs/arch_uml.config @@ -1,8 +1,7 @@ # Config options which are added to UML builds by default -# Enable virtio/pci, as a lot of tests require it. -CONFIG_VIRTIO_UML=y -CONFIG_UML_PCI_OVER_VIRTIO=y +# Enable pci, as a lot of tests require it. +CONFIG_KUNIT_UML_PCI=y # Enable FORTIFY_SOURCE for wider checking. CONFIG_FORTIFY_SOURCE=y --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250626-kunit-uml-pci-a2b687553746 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

1 week

2
1
0 0

[PATCH 0/3] signal handling support for nolibc

by Benjamin Berg

From: Benjamin Berg <benjamin.berg(a)intel.com> Hi, This patchset adds signal handling to nolibc. Initially, I would like to use this for tests. But in the long run, the goal is to use nolibc for the UML kernel itself. In both cases, signal handling will be needed. Benjamin Benjamin Berg (3): tools/nolibc: show failed run if test process crashes tools/nolibc: add more generic BITSET_* macros for FD_* tools/nolibc: add signal support tools/include/nolibc/arch-arm.h | 7 ++ tools/include/nolibc/arch-arm64.h | 3 + tools/include/nolibc/arch-loongarch.h | 3 + tools/include/nolibc/arch-m68k.h | 10 ++ tools/include/nolibc/arch-mips.h | 3 + tools/include/nolibc/arch-powerpc.h | 8 ++ tools/include/nolibc/arch-riscv.h | 3 + tools/include/nolibc/arch-s390.h | 8 +- tools/include/nolibc/arch-sh.h | 5 + tools/include/nolibc/arch-sparc.h | 47 ++++++++ tools/include/nolibc/arch-x86.h | 13 +++ tools/include/nolibc/signal.h | 103 ++++++++++++++++++ tools/include/nolibc/sys.h | 2 +- tools/include/nolibc/time.h | 3 +- tools/include/nolibc/types.h | 67 ++++++------ .../testing/selftests/nolibc/Makefile.nolibc | 3 +- tools/testing/selftests/nolibc/nolibc-test.c | 67 ++++++++++++ 17 files changed, 319 insertions(+), 36 deletions(-) -- 2.50.0

1 week

3
9
0 0

[PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case

by Tengda Wu

The first 100 lines of trace output don't always contain 3 or more distinct events. In busy systems, they may be dominated by repetitive events like sched_stat_runtime, causing the `$count -lt 3` check to fail. Example trace: $ head -n 100 trace | grep -v ^# systemd-timesyn-266 [006] d.h2. 738.778482: sched_stat_runtime: comm=systemd-timesyn pid=266 runtime=976854 [ns] ftracetest-8751 [001] d.h2. 738.778512: sched_stat_runtime: comm=ftracetest pid=8751 runtime=938335 [ns] systemd-timesyn-266 [006] d.h1. 738.779531: sched_stat_runtime: comm=systemd-timesyn pid=266 runtime=1044284 [ns] ftracetest-8751 [001] d.h2. 738.779541: sched_stat_runtime: comm=ftracetest pid=8751 runtime=1028575 [ns] systemd-1 [007] d.h5. 738.779657: sched_stat_runtime: comm=systemd pid=1 runtime=642624 [ns] [...] With trace cleared, simply check `$count -eq 0` to confirm subsystem enablement, just like toplevel-enable.tc does. Fixes: 1a4ea83a6e67 ("selftests/ftrace: Limit length in subsystem-enable tests") Signed-off-by: Tengda Wu <wutengda(a)huaweicloud.com> --- .../selftests/ftrace/test.d/event/subsystem-enable.tc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc index b7c8f29c09a9..3a28adc7b727 100644 --- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc +++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc @@ -19,8 +19,8 @@ echo 'sched:*' > set_event yield count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l` -if [ $count -lt 3 ]; then - fail "at least fork, exec and exit events should be recorded" +if [ $count -eq 0 ]; then + fail "none of scheduler events are recorded" fi do_reset @@ -30,8 +30,8 @@ echo 1 > events/sched/enable yield count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l` -if [ $count -lt 3 ]; then - fail "at least fork, exec and exit events should be recorded" +if [ $count -eq 0 ]; then + fail "none of scheduler events are recorded" fi do_reset -- 2.34.1

1 week

2
4
0 0

[PATCH net] selftests: net: lib: fix shift count out of range

by Hangbin Liu

I got the following warning when writing other tests: + handle_test_result_pass 'bond 802.3ad' '(lacp_active off)' + local 'test_name=bond 802.3ad' + shift + local 'opt_str=(lacp_active off)' + shift + log_test_result 'bond 802.3ad' '(lacp_active off)' ' OK ' + local 'test_name=bond 802.3ad' + shift + local 'opt_str=(lacp_active off)' + shift + local 'result= OK ' + shift + local retmsg= + shift /net/tools/testing/selftests/net/forwarding/../lib.sh: line 315: shift: shift count out of range This happens because an extra shift is executed even after all arguments have been consumed. Remove the last shift in log_test_result() to avoid this warning. Fixes: a923af1ceee7 ("selftests: forwarding: Convert log_test() to recognize RET values") Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com> --- tools/testing/selftests/net/lib.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh index 006fdadcc4b9..86a216e9aca8 100644 --- a/tools/testing/selftests/net/lib.sh +++ b/tools/testing/selftests/net/lib.sh @@ -312,7 +312,7 @@ log_test_result() local test_name=$1; shift local opt_str=$1; shift local result=$1; shift - local retmsg=$1; shift + local retmsg=$1 printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result" if [[ $retmsg ]]; then -- 2.46.0

1 week

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror