Linux-kselftest-mirror

linux-kselftest-mirror@lists.linaro.org

116 participants
14258 discussions

[PATCH] selftests/mm: on-fault-limit: run test without root privileges otherwise skip

by Muhammad Usama Anjum

The mmap() respects rlimit only for normal users. This test should be run as normal user, without root privileges. Fixes: b6221771d468 ("selftests/mm: run_vmtests: remove sudo and conform to tap") Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> --- tools/testing/selftests/mm/on-fault-limit.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/mm/on-fault-limit.c b/tools/testing/selftests/mm/on-fault-limit.c index 0ea98ffab3589..431c1277d83a1 100644 --- a/tools/testing/selftests/mm/on-fault-limit.c +++ b/tools/testing/selftests/mm/on-fault-limit.c @@ -21,7 +21,7 @@ static void test_limit(void) map = mmap(NULL, 2 * lims.rlim_max, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0); - ksft_test_result(map == MAP_FAILED, "Failed mmap\n"); + ksft_test_result(map == MAP_FAILED, "The map failed respecting mlock limits\n"); if (map != MAP_FAILED) munmap(map, 2 * lims.rlim_max); @@ -33,8 +33,8 @@ int main(int argc, char **argv) ksft_print_header(); ksft_set_plan(1); - if (getuid()) - ksft_test_result_skip("Require root privileges to run\n"); + if (!getuid()) + ksft_test_result_skip("The test must be run from a normal user\n"); else test_limit(); -- 2.42.0

1 year, 11 months

[PATCH net-next] selftests: net: Fix bridge backup port test flakiness

by Ido Schimmel

The test toggles the carrier of a bridge port in order to test the bridge backup port feature. Due to the linkwatch delayed work the carrier change is not always reflected fast enough to the bridge driver and packets are not forwarded as the test expects, resulting in failures [1]. Fix by adding a one second delay after a carrier change in places where a packet is sent immediately after the carrier change. [1] # Backup port # ----------- [...] # TEST: swp1 carrier off [ OK ] # TEST: No forwarding out of swp1 [FAIL] [ 641.995910] br0: port 1(swp1) entered disabled state # TEST: No forwarding out of vx0 [ OK ] Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") Signed-off-by: Ido Schimmel <idosch(a)nvidia.com> --- Jakub, targeting at net-next to see if it helps the CI, but can be applied to net. I'm unable to reproduce the failure locally. --- tools/testing/selftests/net/test_bridge_backup_port.sh | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh b/tools/testing/selftests/net/test_bridge_backup_port.sh index 70a7d87ba2d2..92078b56ae0a 100755 --- a/tools/testing/selftests/net/test_bridge_backup_port.sh +++ b/tools/testing/selftests/net/test_bridge_backup_port.sh @@ -260,6 +260,7 @@ backup_port() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 1 @@ -285,6 +286,7 @@ backup_port() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 2 @@ -294,6 +296,7 @@ backup_port() run_cmd "ip -n $sw1 link set dev swp1 carrier on" log_test $? 0 "swp1 carrier on" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 3 @@ -315,6 +318,7 @@ backup_port() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 4 @@ -370,6 +374,7 @@ backup_nhid() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 1 @@ -399,6 +404,7 @@ backup_nhid() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 2 @@ -412,6 +418,7 @@ backup_nhid() run_cmd "ip -n $sw1 link set dev swp1 carrier on" log_test $? 0 "swp1 carrier on" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 3 @@ -442,6 +449,7 @@ backup_nhid() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 4 @@ -498,6 +506,7 @@ backup_nhid_invalid() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 0 @@ -605,6 +614,7 @@ backup_nhid_ping() run_cmd "ip -n $sw1 link set dev swp1 carrier off" run_cmd "ip -n $sw2 link set dev swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66" log_test $? 0 "Ping with backup nexthop ID" -- 2.43.0

1 year, 11 months

[PATCH net 0/3] selftests: net: a few pmtu.sh fixes

by Paolo Abeni

This series try to address CI failures for the pmtu.sh tests. It does _not_ attempt to enable all the currently skipped cases, to avoid adding more entropy. Tested with: make -C tools/testing/selftests/ TARGETS=net install vng --build --config tools/testing/selftests/net/config vng --run . --user root -- \ ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:pmtu.sh Paolo Abeni (3): selftests: net: add missing config for pmtu.sh tests selftests: net: fix available tunnels detection selftests: net: don't access /dev/stdout in pmtu.sh tools/testing/selftests/net/config | 3 +++ tools/testing/selftests/net/pmtu.sh | 18 +++++++++--------- 2 files changed, 12 insertions(+), 9 deletions(-) -- 2.43.0

1 year, 11 months

Re: Regression on drm-tip

by Richard Fitzgerald

On 31/1/24 05:34, Borah, Chaitanya Kumar wrote: > Hello Richard, > > Hope you are doing well. I am Chaitanya from the Linux graphics team in Intel. > > This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip[2] repository. > These are captured by gitlab issues[3]. > > We bisected the issue and have found the following commit to be the first bad commit. > ````````````````````````````````````````````````````````````````````````````````````````````````````````` > commit a0b84213f947176ddcd0e96e0751a109f28cde21 > Author: Richard Fitzgerald rf(a)opensource.cirrus.com > Date: Mon Dec 18 15:17:29 2023 +0000 > > kunit: Fix NULL-dereference in kunit_init_suite() if suite->log is NULL > > suite->log must be checked for NULL before passing it to > string_stream_clear(). This was done in kunit_init_test() but was missing > from kunit_init_suite(). > > Signed-off-by: Richard Fitzgerald rf(a)opensource.cirrus.com > Fixes: 6d696c4695c5 ("kunit: add ability to run tests after boot using debugfs") > Reviewed-by: Rae Moar rmoar(a)google.com > Acked-by: David Gow davidgow(a)google.com > Reviewed-by: Muhammad Usama Anjum usama.anjum(a)collabora.com > Signed-off-by: Shuah Khan skhan(a)linuxfoundation.org > > lib/kunit/test.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > ````````````````````````````````````````````````````````````````````````````````````````````````````````` > We tried reverting the patch and the original issue is not seen but it results in NULL pointer deference[4] which I am guessing is expected. > > Could you please check why the patch causes this regression and provide a fix if necessary? > > [1] https://intel-gfx-ci.01.org/tree/drm-tip/index.html?testfilter=drm > [2] https://cgit.freedesktop.org/drm-tip/ > [3] https://gitlab.freedesktop.org/drm/intel/-/issues/10140 > https://gitlab.freedesktop.org/drm/intel/-/issues/10143 > [4] > [ 179.849411] [IGT] drm_buddy: executing > [ 179.856385] [IGT] drm_buddy: starting subtest drm_buddy > [ 179.862594] KTAP version 1 > [ 179.862600] 1..1 > [ 179.863375] BUG: kernel NULL pointer dereference, address: 0000000000000030 > [ 179.863381] #PF: supervisor read access in kernel mode > [ 179.863384] #PF: error_code(0x0000) - not-present page > [ 179.863387] PGD 0 P4D 0 > [ 179.863391] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 179.863395] CPU: 1 PID: 1319 Comm: drm_buddy Not tainted 6.8.0-rc1-bisecttrail015 #16 > [ 179.863398] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPFWI1.R00.3471.D81.2311291340 11/29/2023 > [ 179.863400] RIP: 0010:__lock_acquire+0x71f/0x2300 > [ 179.863408] Code: 84 03 06 00 00 44 8b 15 27 f6 72 01 45 85 d2 0f 84 9c 00 00 00 f6 45 22 10 0f 84 63 03 00 00 41 bf 01 00 00 00 e9 8a 00 00 00 <48> 81 3f 40 d7 fa 82 41 b9 00 00 00 00 45 0f 45 c8 83 fe 01 0f 87 > ... > [ 179.863445] PKRU: 55555554 > [ 179.863448] Call Trace: > [ 179.863450] <TASK> > [ 179.863453] ? __die_body+0x1a/0x60 > [ 179.863459] ? page_fault_oops+0x156/0x450 > [ 179.863465] ? do_user_addr_fault+0x65/0x9e0 > [ 179.863472] ? exc_page_fault+0x68/0x1a0 > [ 179.863479] ? asm_exc_page_fault+0x26/0x30 > [ 179.863487] ? __lock_acquire+0x71f/0x2300 > [ 179.863493] ? __pfx_do_sync_core+0x10/0x10 > [ 179.863500] lock_acquire+0xd8/0x2d0 > [ 179.863505] ? string_stream_clear+0x29/0xb0 [kunit] > [ 179.863523] _raw_spin_lock+0x2e/0x40 > [ 179.863528] ? string_stream_clear+0x29/0xb0 [kunit] > [ 179.863540] string_stream_clear+0x29/0xb0 [kunit] > [ 179.863554] __kunit_test_suites_init+0x7e/0xe0 [kunit] > [ 179.863568] kunit_module_notify+0x20f/0x220 [kunit] > [ 179.863583] notifier_call_chain+0x46/0x130 > [ 179.863591] notifier_call_chain_robust+0x3e/0x90 > [ 179.863598] blocking_notifier_call_chain_robust+0x42/0x60 > [ 179.863605] load_module+0x1bcd/0x1f80 > [ 179.863617] ? init_module_from_file+0x86/0xd0 > [ 179.863621] init_module_from_file+0x86/0xd0 > [ 179.863629] idempotent_init_module+0x17c/0x230 > [ 179.863637] __x64_sys_finit_module+0x56/0xb0 > [ 179.863642] do_syscall_64+0x6f/0x140 > [ 179.863649] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 179.863654] RIP: 0033:0x7f0e6676195d Looking at the gitlab bug reports compared to the crash log above: [3] You have hit a failure on the 3rd test case: <6> [59.039608] [IGT] drm_buddy: starting dynamic subtest drm_test_buddy_alloc_limit <6> [59.077701] KTAP version 1 <6> [59.077705] 1..1 <6> [59.078487] KTAP version 1 <6> [59.078494] # Subtest: drm_buddy <6> [59.078496] # module: drm_buddy_test <6> [59.078498] 1..4 <6> [59.079321] ok 1 drm_test_buddy_alloc_limit <6> [59.079973] ok 2 drm_test_buddy_alloc_optimistic <6> [59.080479] [IGT] drm_buddy: finished subtest drm_test_buddy_alloc_limit, SUCCESS When you revert my NULL-dereference bugfix, you are hitting the NULL dereference crash immediately, before executing the test case that is causing [3]. > [ 179.862594] KTAP version 1 > [ 179.862600] 1..1 > [ 179.863375] BUG: kernel NULL pointer dereference So, my commit is not causing your [3]. It is allowing you to reach your test case that is causing [3].

1 year, 11 months

[RFC PATCH v1 0/8] KVM: seftests: Support guest user mode execution and running

by Zeng Guang

This patch series give a proposal to support guest VM running in user mode and in canonical linear address organization as well. First design to parition the 64-bit canonical linear address space into two half parts belonging to user-mode and supervisor-mode respectively, similar as the organization of linear addresses used in linux OS. Currently the linear addresses use 48-bit canonical format in which bits 63:47 of the address are identical. Secondly setup page table mapping the same guest physical address of test code and data segment onto both user-mode and supervisor-mode address space. It allows guest in different runtime mode, i.e. user or supervisor, can run one code base in the corresponding linear address space. Also provide the runtime environment setup API for switching to user mode execution. Zeng Guang (8): KVM: selftests: x86: Fix bug in addr_arch_gva2gpa() KVM: selftests: x86: Support guest running on canonical linear-address organization KVM: selftests: Add virt_arch_ucall_prealloc() arch specific implementation KVM : selftests : Adapt selftest cases to kernel canonical linear address KVM: selftests: x86: Prepare setup for user mode support KVM: selftests: x86: Allow user to access user-mode address and I/O address space KVM: selftests: x86: Support vcpu run in user mode KVM: selftests: x86: Add KVM forced emulation prefix capability .../selftests/kvm/include/kvm_util_base.h | 20 ++- .../selftests/kvm/include/x86_64/processor.h | 48 ++++++- .../selftests/kvm/lib/aarch64/processor.c | 5 + tools/testing/selftests/kvm/lib/kvm_util.c | 6 +- .../selftests/kvm/lib/riscv/processor.c | 5 + .../selftests/kvm/lib/s390x/processor.c | 5 + .../testing/selftests/kvm/lib/ucall_common.c | 2 + .../selftests/kvm/lib/x86_64/processor.c | 117 ++++++++++++++---- .../selftests/kvm/set_memory_region_test.c | 13 +- .../testing/selftests/kvm/x86_64/debug_regs.c | 2 +- .../kvm/x86_64/userspace_msr_exit_test.c | 9 +- 11 files changed, 195 insertions(+), 37 deletions(-) -- 2.21.3

1 year, 11 months

[PATCH] selftests: Add test to verify power supply properties

by Nícolas F. R. A. Prado

Add a kselftest that verifies power supply properties from sysfs and uevent. It checks whether they are present, readable and return valid values. This initial set of properties is not comprehensive, but rather the ones that I was able to validate locally. Co-developed-by: Sebastian Reichel <sebastian.reichel(a)collabora.com> Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com> Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com> --- To give an idea of the output of the test, here's a short (trimmed) snippet: TAP version 13 1..33 # Testing device BAT0 ok 21 BAT0.sysfs.voltage_max # SKIP # Reported: '7600000' uV (7.6 V) ok 22 BAT0.sysfs.voltage_min_design # Totals: pass:19 fail:0 xfail:0 xpass:0 skip:14 error:0 Some things noticed during the development of this patch which may or may not need to be addressed: - input_current_limit, input_voltage_limit reported -1 on one of the platforms, despite that value not being described in the ABI doc [1]. - voltage_min_design, voltage_max_design are missing in the ABI doc, though are mentioned in the rst documentation [2] - the scope property is entirely undocumented [1] Documentation/ABI/testing/sysfs-class-power [2] Documentation/power/power_supply_class.rst --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/power_supply/Makefile | 4 + tools/testing/selftests/power_supply/helpers.sh | 178 +++++++++++++++++++++ .../power_supply/test_power_supply_properties.sh | 114 +++++++++++++ 5 files changed, 298 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index ad5bec15bf0f..f8f620746934 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17524,6 +17524,7 @@ F: Documentation/devicetree/bindings/power/supply/ F: drivers/power/supply/ F: include/linux/power/ F: include/linux/power_supply.h +F: tools/testing/selftests/power_supply/ POWERNV OPERATOR PANEL LCD DISPLAY DRIVER M: Suraj Jitindar Singh <sjitindarsingh(a)gmail.com> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index c5b4574045b3..7e5960cda08c 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -67,6 +67,7 @@ TARGETS += nsfs TARGETS += perf_events TARGETS += pidfd TARGETS += pid_namespace +TARGETS += power_supply TARGETS += powerpc TARGETS += prctl TARGETS += proc diff --git a/tools/testing/selftests/power_supply/Makefile b/tools/testing/selftests/power_supply/Makefile new file mode 100644 index 000000000000..44f0658d3d2e --- /dev/null +++ b/tools/testing/selftests/power_supply/Makefile @@ -0,0 +1,4 @@ +TEST_PROGS := test_power_supply_properties.sh +TEST_FILES := helpers.sh + +include ../lib.mk diff --git a/tools/testing/selftests/power_supply/helpers.sh b/tools/testing/selftests/power_supply/helpers.sh new file mode 100644 index 000000000000..1ec90d7c9108 --- /dev/null +++ b/tools/testing/selftests/power_supply/helpers.sh @@ -0,0 +1,178 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2022, 2024 Collabora Ltd +SYSFS_SUPPLIES=/sys/class/power_supply + +calc() { + awk "BEGIN { print $* }"; +} + +test_sysfs_prop() { + PROP="$1" + VALUE="$2" # optional + + PROP_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP" + TEST_NAME="$DEVNAME".sysfs."$PROP" + + if [ -z "$VALUE" ]; then + ktap_test_result "$TEST_NAME" [ -f "$PROP_PATH" ] + else + ktap_test_result "$TEST_NAME" grep -q "$VALUE" "$PROP_PATH" + fi +} + +to_human_readable_unit() { + VALUE="$1" + UNIT="$2" + + case "$VALUE" in + *[!0-9]* ) return ;; # Not a number + esac + + if [ "$UNIT" = "uA" ]; then + new_unit="mA" + div=1000 + elif [ "$UNIT" = "uV" ]; then + new_unit="V" + div=1000000 + elif [ "$UNIT" = "uAh" ]; then + new_unit="Ah" + div=1000000 + elif [ "$UNIT" = "uW" ]; then + new_unit="mW" + div=1000 + elif [ "$UNIT" = "uWh" ]; then + new_unit="Wh" + div=1000000 + else + return + fi + + value_converted=$(calc "$VALUE"/"$div") + echo "$value_converted" "$new_unit" +} + +_check_sysfs_prop_available() { + PROP=$1 + + PROP_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP" + TEST_NAME="$DEVNAME".sysfs."$PROP" + + if [ ! -e "$PROP_PATH" ] ; then + ktap_test_skip "$TEST_NAME" + return 1 + fi + + if ! cat "$PROP_PATH" >/dev/null; then + ktap_print_msg "Failed to read" + ktap_test_fail "$TEST_NAME" + return 1 + fi + + return 0 +} + +test_sysfs_prop_optional() { + PROP=$1 + UNIT=$2 # optional + + TEST_NAME="$DEVNAME".sysfs."$PROP" + + _check_sysfs_prop_available "$PROP" || return + DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP") + + ktap_print_msg "Reported: '$DATA' $UNIT ($(to_human_readable_unit "$DATA" "$UNIT"))" + ktap_test_pass "$TEST_NAME" +} + +test_sysfs_prop_optional_range() { + PROP=$1 + MIN=$2 + MAX=$3 + UNIT=$4 # optional + + TEST_NAME="$DEVNAME".sysfs."$PROP" + + _check_sysfs_prop_available "$PROP" || return + DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP") + + if [ "$DATA" -lt "$MIN" ] || [ "$DATA" -gt "$MAX" ]; then + ktap_print_msg "'$DATA' is out of range (min=$MIN, max=$MAX)" + ktap_test_fail "$TEST_NAME" + else + ktap_print_msg "Reported: '$DATA' $UNIT ($(to_human_readable_unit "$DATA" "$UNIT"))" + ktap_test_pass "$TEST_NAME" + fi +} + +test_sysfs_prop_optional_list() { + PROP=$1 + LIST=$2 + + TEST_NAME="$DEVNAME".sysfs."$PROP" + + _check_sysfs_prop_available "$PROP" || return + DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP") + + valid=0 + + OLDIFS=$IFS + IFS="," + for item in $LIST; do + if [ "$DATA" = "$item" ]; then + valid=1 + break + fi + done + if [ "$valid" -eq 1 ]; then + ktap_print_msg "Reported: '$DATA'" + ktap_test_pass "$TEST_NAME" + else + ktap_print_msg "'$DATA' is not a valid value for this property" + ktap_test_fail "$TEST_NAME" + fi + IFS=$OLDIFS +} + +dump_file() { + FILE="$1" + while read -r line; do + ktap_print_msg "$line" + done < "$FILE" +} + +__test_uevent_prop() { + PROP="$1" + OPTIONAL="$2" + VALUE="$3" # optional + + UEVENT_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/uevent + TEST_NAME="$DEVNAME".uevent."$PROP" + + if ! grep -q "POWER_SUPPLY_$PROP=" "$UEVENT_PATH"; then + if [ "$OPTIONAL" -eq 1 ]; then + ktap_test_skip "$TEST_NAME" + else + ktap_print_msg "Missing property" + ktap_test_fail "$TEST_NAME" + fi + return + fi + + if ! grep -q "POWER_SUPPLY_$PROP=$VALUE" "$UEVENT_PATH"; then + ktap_print_msg "Invalid value for uevent property, dumping..." + dump_file "$UEVENT_PATH" + ktap_test_fail "$TEST_NAME" + else + ktap_test_pass "$TEST_NAME" + fi +} + +test_uevent_prop() { + __test_uevent_prop "$1" 0 "$2" +} + +test_uevent_prop_optional() { + __test_uevent_prop "$1" 1 "$2" +} diff --git a/tools/testing/selftests/power_supply/test_power_supply_properties.sh b/tools/testing/selftests/power_supply/test_power_supply_properties.sh new file mode 100755 index 000000000000..df272dfe1d2a --- /dev/null +++ b/tools/testing/selftests/power_supply/test_power_supply_properties.sh @@ -0,0 +1,114 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2022, 2024 Collabora Ltd +# +# This test validates the power supply uAPI: namely, the files in sysfs and +# lines in uevent that expose the power supply properties. +# +# By default all power supplies available are tested. Optionally the name of a +# power supply can be passed as a parameter to test only that one instead. +DIR="$(dirname "$(readlink -f "$0")")" + +. "${DIR}"/../kselftest/ktap_helpers.sh + +. "${DIR}"/helpers.sh + +count_tests() { + SUPPLIES=$1 + + # This needs to be updated every time a new test is added. + NUM_TESTS=33 + + total_tests=0 + + for i in $SUPPLIES; do + total_tests=$(("$total_tests" + "$NUM_TESTS")) + done + + echo "$total_tests" +} + +ktap_print_header + +SYSFS_SUPPLIES=/sys/class/power_supply/ + +if [ $# -eq 0 ]; then + supplies=$(ls "$SYSFS_SUPPLIES") +else + supplies=$1 +fi + +ktap_set_plan "$(count_tests "$supplies")" + +for DEVNAME in $supplies; do + ktap_print_msg Testing device "$DEVNAME" + + if [ ! -d "$SYSFS_SUPPLIES"/"$DEVNAME" ]; then + ktap_test_fail "$DEVNAME".exists + ktap_exit_fail_msg Device does not exist + fi + + ktap_test_pass "$DEVNAME".exists + + test_uevent_prop NAME "$DEVNAME" + + test_sysfs_prop type + SUPPLY_TYPE=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/type) + # This fails on kernels < 5.8 (needs 2ad3d74e3c69f) + test_uevent_prop TYPE "$SUPPLY_TYPE" + + test_sysfs_prop_optional usb_type + + test_sysfs_prop_optional_range online 0 2 + test_sysfs_prop_optional_range present 0 1 + + test_sysfs_prop_optional_list status "Unknown","Charging","Discharging","Not charging","Full" + + # Capacity is reported as percentage, thus any value less than 0 and + # greater than 100 are not allowed. + test_sysfs_prop_optional_range capacity 0 100 "%" + + test_sysfs_prop_optional_list capacity_level "Unknown","Critical","Low","Normal","High","Full" + + test_sysfs_prop_optional model_name + test_sysfs_prop_optional manufacturer + test_sysfs_prop_optional serial_number + test_sysfs_prop_optional_list technology "Unknown","NiMH","Li-ion","Li-poly","LiFe","NiCd","LiMn" + + test_sysfs_prop_optional cycle_count + + test_sysfs_prop_optional_list scope "Unknown","System","Device" + + test_sysfs_prop_optional input_current_limit "uA" + test_sysfs_prop_optional input_voltage_limit "uV" + + # Technically the power-supply class does not limit reported values. + # E.g. one could expose an RTC backup-battery, which goes below 1.5V or + # an electric vehicle battery with over 300V. But most devices do not + # have a step-up capable regulator behind the battery and operate with + # voltages considered safe to touch, so we limit the allowed range to + # 1.8V-60V to catch drivers reporting incorrectly scaled values. E.g. a + # common mistake is reporting data in mV instead of µV. + test_sysfs_prop_optional_range voltage_now 1800000 60000000 "uV" + test_sysfs_prop_optional_range voltage_min 1800000 60000000 "uV" + test_sysfs_prop_optional_range voltage_max 1800000 60000000 "uV" + test_sysfs_prop_optional_range voltage_min_design 1800000 60000000 "uV" + test_sysfs_prop_optional_range voltage_max_design 1800000 60000000 "uV" + + # current based systems + test_sysfs_prop_optional current_now "uA" + test_sysfs_prop_optional current_max "uA" + test_sysfs_prop_optional charge_now "uAh" + test_sysfs_prop_optional charge_full "uAh" + test_sysfs_prop_optional charge_full_design "uAh" + + # power based systems + test_sysfs_prop_optional power_now "uW" + test_sysfs_prop_optional energy_now "uWh" + test_sysfs_prop_optional energy_full "uWh" + test_sysfs_prop_optional energy_full_design "uWh" + test_sysfs_prop_optional energy_full_design "uWh" +done + +ktap_finished --- base-commit: f6b014bd664b49fe4b9aecd63de4179f81753e42 change-id: 20240122-power-supply-kselftest-8345017dcbdb Best regards, -- Nícolas F. R. A. Prado <nfraprado(a)collabora.com>

1 year, 11 months

[PATCH net-next] selftests: net: add missing config for GENEVE

by Matthias May

l2_tos_ttl_inherit.sh verifies the inheritance of tos and ttl for GRETAP, VXLAN and GENEVE. Before testing it checks if the required module is available and if not skips the tests accordingly. Currently only GRETAP and VXLAN are tested because the GENEVE module is missing. Signed-off-by: Matthias May <matthias.may(a)westermo.com> --- tools/testing/selftests/net/config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 19ff75051660..8d79c024bebf 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -76,6 +76,7 @@ CONFIG_CRYPTO_SM4_GENERIC=y CONFIG_AMT=m CONFIG_TUN=y CONFIG_VXLAN=m +CONFIG_GENEVE=m CONFIG_IP_SCTP=m CONFIG_NETFILTER_XT_MATCH_POLICY=m CONFIG_CRYPTO_ARIA=y -- 2.39.2

1 year, 11 months

[PATCH net-next] selftests/net: calibrate txtimestamp

by Willem de Bruijn

From: Willem de Bruijn <willemb(a)google.com> The test sends packets and compares enqueue, transmit and Ack timestamps with expected values. It installs netem delays to increase latency between these points. The test proves flaky in virtual environment (vng). Increase the delays to reduce variance. Scale measurement tolerance accordingly. Time sensitive tests are difficult to calibrate. Increasing delays 10x also increases runtime 10x, for one. And it may still prove flaky at some rate. Signed-off-by: Willem de Bruijn <willemb(a)google.com> --- tools/testing/selftests/net/txtimestamp.sh | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/net/txtimestamp.sh b/tools/testing/selftests/net/txtimestamp.sh index 31637769f59f..25baca4b148e 100755 --- a/tools/testing/selftests/net/txtimestamp.sh +++ b/tools/testing/selftests/net/txtimestamp.sh @@ -8,13 +8,13 @@ set -e setup() { # set 1ms delay on lo egress - tc qdisc add dev lo root netem delay 1ms + tc qdisc add dev lo root netem delay 10ms # set 2ms delay on ifb0 egress modprobe ifb ip link add ifb_netem0 type ifb ip link set dev ifb_netem0 up - tc qdisc add dev ifb_netem0 root netem delay 2ms + tc qdisc add dev ifb_netem0 root netem delay 20ms # redirect lo ingress through ifb0 egress tc qdisc add dev lo handle ffff: ingress @@ -24,9 +24,11 @@ setup() { } run_test_v4v6() { - # SND will be delayed 1000us - # ACK will be delayed 6000us: 1 + 2 ms round-trip - local -r args="$@ -v 1000 -V 6000" + # SND will be delayed 10ms + # ACK will be delayed 60ms: 10 + 20 ms round-trip + # allow +/- tolerance of 8ms + # wait for ACK to be queued + local -r args="$@ -v 10000 -V 60000 -t 8000 -S 80000" ./txtimestamp ${args} -4 -L 127.0.0.1 ./txtimestamp ${args} -6 -L ::1 -- 2.43.0.429.g432eaa2c6b-goog

1 year, 11 months

[PATCH v7 0/4] Introduce mseal()

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)chromium.org> This patchset proposes a new mseal() syscall for the Linux kernel. In a nutshell, mseal() protects the VMAs of a given virtual memory range against modifications, such as changes to their permission bits. Modern CPUs support memory permissions, such as the read/write (RW) and no-execute (NX) bits. Linux has supported NX since the release of kernel version 2.6.8 in August 2004 [1]. The memory permission feature improves the security stance on memory corruption bugs, as an attacker cannot simply write to arbitrary memory and point the code to it. The memory must be marked with the X bit, or else an exception will occur. Internally, the kernel maintains the memory permissions in a data structure called VMA (vm_area_struct). mseal() additionally protects the VMA itself against modifications of the selected seal type. Memory sealing is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped. Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4]. Also, Chrome wants to adopt this feature for their CFI work [2] and this patchset has been designed to be compatible with the Chrome use case. Two system calls are involved in sealing the map: mmap() and mseal(). The new mseal() is an syscall on 64 bit CPU, and with following signature: int mseal(void addr, size_t len, unsigned long flags) addr/len: memory range. flags: reserved. mseal() blocks following operations for the given memory range. 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Moving or expanding a different VMA into the current location, via mremap(). 3> Modifying a VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. 5> mprotect() and pkey_mprotect(). 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory. In addition: mmap() has two related changes. The PROT_SEAL bit in prot field of mmap(). When present, it marks the map sealed since creation. The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks the map as sealable. A map created without MAP_SEALABLE will not support sealing, i.e. mseal() will fail. Applications that don't care about sealing will expect their behavior unchanged. For those that need sealing support, opt-in by adding MAP_SEALABLE in mmap(). The idea that inspired this patch comes from Stephen Röttger’s work in V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this API. Indeed, the Chrome browser has very specific requirements for sealing, which are distinct from those of most applications. For example, in the case of libc, sealing is only applied to read-only (RO) or read-execute (RX) memory segments (such as .text and .RELRO) to prevent them from becoming writable, the lifetime of those mappings are tied to the lifetime of the process. Chrome wants to seal two large address space reservations that are managed by different allocators. The memory is mapped RW- and RWX respectively but write access to it is restricted using pkeys (or in the future ARM permission overlay extensions). The lifetime of those mappings are not tied to the lifetime of the process, therefore, while the memory is sealed, the allocators still need to free or discard the unused memory. For example, with madvise(DONTNEED). However, always allowing madvise(DONTNEED) on this range poses a security risk. For example if a jump instruction crosses a page boundary and the second page gets discarded, it will overwrite the target bytes with zeros and change the control flow. Checking write-permission before the discard operation allows us to control when the operation is valid. In this case, the madvise will only succeed if the executing thread has PKEY write permissions and PKRU changes are protected in software by control-flow integrity. Although the initial version of this patch series is targeting the Chrome browser as its first user, it became evident during upstream discussions that we would also want to ensure that the patch set eventually is a complete solution for memory sealing and compatible with other use cases. The specific scenario currently in mind is glibc's use case of loading and sealing ELF executables. To this end, Stephen is working on a change to glibc to add sealing support to the dynamic linker, which will seal all non-writable segments at startup. Once this work is completed, all applications will be able to automatically benefit from these new protections. In closing, I would like to formally acknowledge the valuable contributions received during the RFC process, which were instrumental in shaping this patch: Jann Horn: raising awareness and providing valuable insights on the destructive madvise operations. Linus Torvalds: assisting in defining system call signature and scope. Pedro Falcato: suggesting sealing in the mmap(). Theo de Raadt: sharing the experiences and insights gained from implementing mimmutable() in OpenBSD. Change history: =============== V7: - fix index.rst (Randy Dunlap) - fix arm build (Randy Dunlap) - return EPERM for blocked operations (Theo de Raadt) V6: - Drop RFC from subject, Given Linus's general approval. - Adjust syscall number for mseal (main Jan.11/2024) - Code style fix (Matthew Wilcox) - selftest: use ksft macros (Muhammad Usama Anjum) - Document fix. (Randy Dunlap) https://lore.kernel.org/all/20240111234142.2944934-1-jeffxu@chromium.org/ V5: - fix build issue in mseal-Wire-up-mseal-syscall (Suggested by Linus Torvalds, and Greg KH) - updates on selftest. https://lore.kernel.org/lkml/20240109154547.1839886-1-jeffxu@chromium.org/#r V4: (Suggested by Linus Torvalds) - new signature: mseal(start,len,flags) - 32 bit is not supported. vm_seal is removed, use vm_flags instead. - single bit in vm_flags for sealed state. - CONFIG_MSEAL kernel config is removed. - single bit of PROT_SEAL in the "Prot" field of mmap(). Other changes: - update selftest (Suggested by Muhammad Usama Anjum) - update documentation. https://lore.kernel.org/all/20240104185138.169307-1-jeffxu@chromium.org/ V3: - Abandon per-syscall approach, (Suggested by Linus Torvalds). - Organize sealing types around their functionality, such as MM_SEAL_BASE, MM_SEAL_PROT_PKEY. - Extend the scope of sealing from calls originated in userspace to both kernel and userspace. (Suggested by Linus Torvalds) - Add seal type support in mmap(). (Suggested by Pedro Falcato) - Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent destructive operations of madvise. (Suggested by Jann Horn and Stephen Röttger) - Make sealed VMAs mergeable. (Suggested by Jann Horn) - Add MAP_SEALABLE to mmap() - Add documentation - mseal.rst https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o… v2: Use _BITUL to define MM_SEAL_XX type. Use unsigned long for seal type in sys_mseal() and other functions. Remove internal VM_SEAL_XX type and convert_user_seal_type(). Remove MM_ACTION_XX type. Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask. Add more comments in code. Add a detailed commit message. https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/ v1: https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/ ---------------------------------------------------------------- [1] https://kernelnewbies.org/Linux_2_6_8 [2] https://v8.dev/blog/control-flow-integrity [3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b… [4] https://man.openbsd.org/mimmutable.2 [5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge… [6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf… [7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/ Jeff Xu (4): mseal: Wire up mseal syscall mseal: add mseal syscall selftest mm/mseal memory sealing mseal:add documentation Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/mseal.rst | 183 ++ arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/mm.h | 48 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/mman-common.h | 8 + include/uapi/asm-generic/unistd.h | 5 +- kernel/sys_ni.c | 1 + mm/Makefile | 4 + mm/madvise.c | 12 + mm/mmap.c | 27 + mm/mprotect.c | 10 + mm/mremap.c | 31 + mm/mseal.c | 343 ++++ tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/mseal_test.c | 1997 +++++++++++++++++++ 33 files changed, 2690 insertions(+), 2 deletions(-) create mode 100644 Documentation/userspace-api/mseal.rst create mode 100644 mm/mseal.c create mode 100644 tools/testing/selftests/mm/mseal_test.c -- 2.43.0.429.g432eaa2c6b-goog

1 year, 11 months

[PATCH v2 1/1] userfaultfd: handle zeropage moves by UFFDIO_MOVE

by Suren Baghdasaryan

Current implementation of UFFDIO_MOVE fails to move zeropages and returns EBUSY when it encounters one. We can handle them by mapping a zeropage at the destination and clearing the mapping at the source. This is done both for ordinary and for huge zeropages. Reported-by: kernel test robot <lkp(a)intel.com> Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org> Closes: https://lore.kernel.org/r/202401300107.U8iMAkTl-lkp@intel.com/ Signed-off-by: Suren Baghdasaryan <surenb(a)google.com> --- Changes since v1 [1] - Added missing double_pt_unlock in move_zeropage_pte, per Dan Carpenter - Added Reported-by and Closes tags per bug report [2] Applies cleanly over mm-unstable branch. [1] https://lore.kernel.org/all/20240125001328.335127-1-surenb@google.com/ [2] https://lore.kernel.org/all/202401300107.U8iMAkTl-lkp@intel.com/ mm/huge_memory.c | 105 +++++++++++++++++++++++++++-------------------- mm/userfaultfd.c | 44 ++++++++++++++++---- 2 files changed, 98 insertions(+), 51 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f005f0424735..016e20bd813e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2200,13 +2200,18 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm } src_page = pmd_page(src_pmdval); - if (unlikely(!PageAnonExclusive(src_page))) { - spin_unlock(src_ptl); - return -EBUSY; - } - src_folio = page_folio(src_page); - folio_get(src_folio); + if (!is_huge_zero_pmd(src_pmdval)) { + if (unlikely(!PageAnonExclusive(src_page))) { + spin_unlock(src_ptl); + return -EBUSY; + } + + src_folio = page_folio(src_page); + folio_get(src_folio); + } else + src_folio = NULL; + spin_unlock(src_ptl); flush_cache_range(src_vma, src_addr, src_addr + HPAGE_PMD_SIZE); @@ -2214,19 +2219,22 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm src_addr + HPAGE_PMD_SIZE); mmu_notifier_invalidate_range_start(&range); - folio_lock(src_folio); + if (src_folio) { + folio_lock(src_folio); - /* - * split_huge_page walks the anon_vma chain without the page - * lock. Serialize against it with the anon_vma lock, the page - * lock is not enough. - */ - src_anon_vma = folio_get_anon_vma(src_folio); - if (!src_anon_vma) { - err = -EAGAIN; - goto unlock_folio; - } - anon_vma_lock_write(src_anon_vma); + /* + * split_huge_page walks the anon_vma chain without the page + * lock. Serialize against it with the anon_vma lock, the page + * lock is not enough. + */ + src_anon_vma = folio_get_anon_vma(src_folio); + if (!src_anon_vma) { + err = -EAGAIN; + goto unlock_folio; + } + anon_vma_lock_write(src_anon_vma); + } else + src_anon_vma = NULL; dst_ptl = pmd_lockptr(mm, dst_pmd); double_pt_lock(src_ptl, dst_ptl); @@ -2235,45 +2243,54 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm err = -EAGAIN; goto unlock_ptls; } - if (folio_maybe_dma_pinned(src_folio) || - !PageAnonExclusive(&src_folio->page)) { - err = -EBUSY; - goto unlock_ptls; - } + if (src_folio) { + if (folio_maybe_dma_pinned(src_folio) || + !PageAnonExclusive(&src_folio->page)) { + err = -EBUSY; + goto unlock_ptls; + } - if (WARN_ON_ONCE(!folio_test_head(src_folio)) || - WARN_ON_ONCE(!folio_test_anon(src_folio))) { - err = -EBUSY; - goto unlock_ptls; - } + if (WARN_ON_ONCE(!folio_test_head(src_folio)) || + WARN_ON_ONCE(!folio_test_anon(src_folio))) { + err = -EBUSY; + goto unlock_ptls; + } - folio_move_anon_rmap(src_folio, dst_vma); - WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); + folio_move_anon_rmap(src_folio, dst_vma); + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); - src_pmdval = pmdp_huge_clear_flush(src_vma, src_addr, src_pmd); - /* Folio got pinned from under us. Put it back and fail the move. */ - if (folio_maybe_dma_pinned(src_folio)) { - set_pmd_at(mm, src_addr, src_pmd, src_pmdval); - err = -EBUSY; - goto unlock_ptls; - } + src_pmdval = pmdp_huge_clear_flush(src_vma, src_addr, src_pmd); + /* Folio got pinned from under us. Put it back and fail the move. */ + if (folio_maybe_dma_pinned(src_folio)) { + set_pmd_at(mm, src_addr, src_pmd, src_pmdval); + err = -EBUSY; + goto unlock_ptls; + } - _dst_pmd = mk_huge_pmd(&src_folio->page, dst_vma->vm_page_prot); - /* Follow mremap() behavior and treat the entry dirty after the move */ - _dst_pmd = pmd_mkwrite(pmd_mkdirty(_dst_pmd), dst_vma); + _dst_pmd = mk_huge_pmd(&src_folio->page, dst_vma->vm_page_prot); + /* Follow mremap() behavior and treat the entry dirty after the move */ + _dst_pmd = pmd_mkwrite(pmd_mkdirty(_dst_pmd), dst_vma); + } else { + src_pmdval = pmdp_huge_clear_flush(src_vma, src_addr, src_pmd); + _dst_pmd = mk_huge_pmd(src_page, dst_vma->vm_page_prot); + } set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd); src_pgtable = pgtable_trans_huge_withdraw(mm, src_pmd); pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable); unlock_ptls: double_pt_unlock(src_ptl, dst_ptl); - anon_vma_unlock_write(src_anon_vma); - put_anon_vma(src_anon_vma); + if (src_anon_vma) { + anon_vma_unlock_write(src_anon_vma); + put_anon_vma(src_anon_vma); + } unlock_folio: /* unblock rmap walks */ - folio_unlock(src_folio); + if (src_folio) + folio_unlock(src_folio); mmu_notifier_invalidate_range_end(&range); - folio_put(src_folio); + if (src_folio) + folio_put(src_folio); return err; } #endif /* CONFIG_USERFAULTFD */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index ae80c3714829..9cc93cc1330b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -959,6 +959,33 @@ static int move_swap_pte(struct mm_struct *mm, return 0; } +static int move_zeropage_pte(struct mm_struct *mm, + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, + unsigned long dst_addr, unsigned long src_addr, + pte_t *dst_pte, pte_t *src_pte, + pte_t orig_dst_pte, pte_t orig_src_pte, + spinlock_t *dst_ptl, spinlock_t *src_ptl) +{ + pte_t zero_pte; + + double_pt_lock(dst_ptl, src_ptl); + if (!pte_same(ptep_get(src_pte), orig_src_pte) || + !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + double_pt_unlock(dst_ptl, src_ptl); + return -EAGAIN; + } + + zero_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr), + dst_vma->vm_page_prot)); + ptep_clear_flush(src_vma, src_addr, src_pte); + set_pte_at(mm, dst_addr, dst_pte, zero_pte); + double_pt_unlock(dst_ptl, src_ptl); + + return 0; +} + + /* * The mmap_lock for reading is held by the caller. Just move the page * from src_pmd to dst_pmd if possible, and return true if succeeded @@ -1041,6 +1068,14 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, } if (pte_present(orig_src_pte)) { + if (is_zero_pfn(pte_pfn(orig_src_pte))) { + err = move_zeropage_pte(mm, dst_vma, src_vma, + dst_addr, src_addr, dst_pte, src_pte, + orig_dst_pte, orig_src_pte, + dst_ptl, src_ptl); + goto out; + } + /* * Pin and lock both source folio and anon_vma. Since we are in * RCU read section, we can't block, so on contention have to @@ -1404,19 +1439,14 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, struct mm_struct *mm, err = -ENOENT; break; } - /* Avoid moving zeropages for now */ - if (is_huge_zero_pmd(*src_pmd)) { - spin_unlock(ptl); - err = -EBUSY; - break; - } /* Check if we can move the pmd without splitting it. */ if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) || !pmd_none(dst_pmdval)) { struct folio *folio = pfn_folio(pmd_pfn(*src_pmd)); - if (!folio || !PageAnonExclusive(&folio->page)) { + if (!folio || (!is_huge_zero_page(&folio->page) && + !PageAnonExclusive(&folio->page))) { spin_unlock(ptl); err = -EBUSY; break; -- 2.43.0.429.g432eaa2c6b-goog

1 year, 11 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror