For now, we did not support reliable R/O long-term pinning in COW mappings.
That means, if we would trigger R/O long-term pinning in MAP_PRIVATE
mapping, we could end up pinning the (R/O-mapped) shared zeropage or a
pagecache page.
The next write access would trigger a write fault and replace the pinned
page by an exclusive anonymous page in the process page table; whatever the
process would write to that private page copy would not be visible by the
owner of the previous page pin: for example, RDMA could read stale data.
The end result is essentially an unexpected and hard-to-debug memory
corruption.
Some drivers tried working around that limitation by using
"FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now.
FOLL_WRITE would trigger a write fault, if required, and break COW before
pinning the page. FOLL_FORCE is required because the VMA might lack write
permissions, and drivers wanted to make that working as well, just like
one would expect (no write access, but still triggering a write access to
break COW).
However, that is not a practical solution, because
(1) Drivers that don't stick to that undocumented and debatable pattern
would still run into that issue. For example, VFIO only uses
FOLL_LONGTERM for R/O long-term pinning.
(2) Using FOLL_WRITE just to work around a COW mapping + page pinning
limitation is unintuitive. FOLL_WRITE would, for example, mark the
page softdirty or trigger uffd-wp, even though, there actually isn't
going to be any write access.
(3) The purpose of FOLL_FORCE is debug access, not access without lack of
VMA permissions by arbitrarty drivers.
So instead, make R/O long-term pinning work as expected, by breaking COW
in a COW mapping early, such that we can remove any FOLL_FORCE usage from
drivers. More details in patch #8.
Patches #1--#3 add COW tests for non-anonymous pages.
Patches #4--#7 prepare core MM for extended FAULT_FLAG_UNSHARE support in
COW mappings.
Patch #8 implements reliable R/O long-term pinning in COW mappings
Patches #9--#19 remove any FOLL_FORCE usage from drivers.
I'm refraining from CCing all driver maintainers on the whole patch set.
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Shuah Khan <shuah(a)kernel.org
Cc: Lucas Stach <l.stach(a)pengutronix.de>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Oded Gabbay <ogabbay(a)kernel.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
David Hildenbrand (19):
selftests/vm: anon_cow: prepare for non-anonymous COW tests
selftests/vm: cow: basic COW tests for non-anonymous pages
selftests/vm: cow: R/O long-term pinning reliability tests for
non-anon pages
mm: add early FAULT_FLAG_UNSHARE consistency checks
mm: add early FAULT_FLAG_WRITE consistency checks
mm: rework handling in do_wp_page() based on private vs. shared
mappings
mm: don't call vm_ops->huge_fault() in wp_huge_pmd()/wp_huge_pud() for
private mappings
mm: extend FAULT_FLAG_UNSHARE support to anything in a COW mapping
mm/gup: reliable R/O long-term pinning in COW mappings
RDMA/umem: remove FOLL_FORCE usage
RDMA/usnic: remove FOLL_FORCE usage
RDMA/siw: remove FOLL_FORCE usage
media: videobuf-dma-sg: remove FOLL_FORCE usage
drm/etnaviv: remove FOLL_FORCE usage
media: pci/ivtv: remove FOLL_FORCE usage
mm/frame-vector: remove FOLL_FORCE usage
drm/exynos: remove FOLL_FORCE usage
RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage
habanalabs: remove FOLL_FORCE usage
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 8 +-
drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +-
drivers/infiniband/core/umem.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +-
drivers/infiniband/sw/siw/siw_mem.c | 9 +-
drivers/media/common/videobuf2/frame_vector.c | 2 +-
drivers/media/pci/ivtv/ivtv-udma.c | 2 +-
drivers/media/pci/ivtv/ivtv-yuv.c | 5 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 14 +-
drivers/misc/habanalabs/common/memory.c | 3 +-
include/linux/mm.h | 27 +-
include/linux/mm_types.h | 8 +-
mm/gup.c | 10 +-
mm/huge_memory.c | 5 +-
mm/hugetlb.c | 12 +-
mm/memory.c | 97 +++--
tools/testing/selftests/vm/.gitignore | 2 +-
tools/testing/selftests/vm/Makefile | 10 +-
tools/testing/selftests/vm/check_config.sh | 4 +-
.../selftests/vm/{anon_cow.c => cow.c} | 387 +++++++++++++++++-
tools/testing/selftests/vm/run_vmtests.sh | 8 +-
22 files changed, 516 insertions(+), 118 deletions(-)
rename tools/testing/selftests/vm/{anon_cow.c => cow.c} (74%)
--
2.38.1
Verify when a bond is configured with {up,down}delay and the link state
of slave members flaps if there are no remaining members up the bond
should immediately select a member to bring up. (from bonding.txt
section 13.1 paragraph 4)
Suggested-by: Liang Li <liali(a)redhat.com>
Signed-off-by: Jonathan Toppins <jtoppins(a)redhat.com>
---
.../selftests/drivers/net/bonding/Makefile | 4 +-
.../selftests/drivers/net/bonding/lag_lib.sh | 107 ++++++++++++++++++
.../net/bonding/mode-1-recovery-updelay.sh | 45 ++++++++
.../net/bonding/mode-2-recovery-updelay.sh | 45 ++++++++
.../selftests/drivers/net/bonding/settings | 2 +-
5 files changed, 201 insertions(+), 2 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh
create mode 100755 tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh
diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 6b8d2e2f23c2..0f3921908b07 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -5,7 +5,9 @@ TEST_PROGS := \
bond-arp-interval-causes-panic.sh \
bond-break-lacpdu-tx.sh \
bond-lladdr-target.sh \
- dev_addr_lists.sh
+ dev_addr_lists.sh \
+ mode-1-recovery-updelay.sh \
+ mode-2-recovery-updelay.sh
TEST_FILES := \
lag_lib.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
index 16c7fb858ac1..6dc9af1f2428 100644
--- a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
+++ b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
@@ -1,6 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+NAMESPACES=""
+
# Test that a link aggregation device (bonding, team) removes the hardware
# addresses that it adds on its underlying devices.
test_LAG_cleanup()
@@ -59,3 +61,108 @@ test_LAG_cleanup()
log_test "$driver cleanup mode $mode"
}
+
+# Build a generic 2 node net namespace with 2 connections
+# between the namespaces
+#
+# +-----------+ +-----------+
+# | node1 | | node2 |
+# | | | |
+# | | | |
+# | eth0 +-------+ eth0 |
+# | | | |
+# | eth1 +-------+ eth1 |
+# | | | |
+# +-----------+ +-----------+
+lag_setup2x2()
+{
+ local state=${1:-down}
+ local namespaces="lag_node1 lag_node2"
+
+ # create namespaces
+ for n in ${namespaces}; do
+ ip netns add ${n}
+ done
+
+ # wire up namespaces
+ ip link add name lag1 type veth peer name lag1-end
+ ip link set dev lag1 netns lag_node1 $state name eth0
+ ip link set dev lag1-end netns lag_node2 $state name eth0
+
+ ip link add name lag1 type veth peer name lag1-end
+ ip link set dev lag1 netns lag_node1 $state name eth1
+ ip link set dev lag1-end netns lag_node2 $state name eth1
+
+ NAMESPACES="${namespaces}"
+}
+
+# cleanup all lag related namespaces and remove the bonding module
+lag_cleanup()
+{
+ for n in ${NAMESPACES}; do
+ ip netns delete ${n} >/dev/null 2>&1 || true
+ done
+ modprobe -r bonding
+}
+
+SWITCH="lag_node1"
+CLIENT="lag_node2"
+CLIENTIP="172.20.2.1"
+SWITCHIP="172.20.2.2"
+
+lag_setup_network()
+{
+ lag_setup2x2 "down"
+
+ # create switch
+ ip netns exec ${SWITCH} ip link add br0 up type bridge
+ ip netns exec ${SWITCH} ip link set eth0 master br0 up
+ ip netns exec ${SWITCH} ip link set eth1 master br0 up
+ ip netns exec ${SWITCH} ip addr add ${SWITCHIP}/24 dev br0
+}
+
+lag_reset_network()
+{
+ ip netns exec ${CLIENT} ip link del bond0
+ ip netns exec ${SWITCH} ip link set eth0 up
+ ip netns exec ${SWITCH} ip link set eth1 up
+}
+
+create_bond()
+{
+ # create client
+ ip netns exec ${CLIENT} ip link set eth0 down
+ ip netns exec ${CLIENT} ip link set eth1 down
+
+ ip netns exec ${CLIENT} ip link add bond0 type bond $@
+ ip netns exec ${CLIENT} ip link set eth0 master bond0
+ ip netns exec ${CLIENT} ip link set eth1 master bond0
+ ip netns exec ${CLIENT} ip link set bond0 up
+ ip netns exec ${CLIENT} ip addr add ${CLIENTIP}/24 dev bond0
+}
+
+test_bond_recovery()
+{
+ RET=0
+
+ create_bond $@
+
+ # verify connectivity
+ ip netns exec ${CLIENT} ping ${SWITCHIP} -c 5 >/dev/null 2>&1
+ check_err $? "No connectivity"
+
+ # force the links of the bond down
+ ip netns exec ${SWITCH} ip link set eth0 down
+ sleep 2
+ ip netns exec ${SWITCH} ip link set eth0 up
+ ip netns exec ${SWITCH} ip link set eth1 down
+
+ # re-verify connectivity
+ ip netns exec ${CLIENT} ping ${SWITCHIP} -c 5 >/dev/null 2>&1
+
+ local rc=$?
+ check_err $rc "Bond failed to recover"
+ log_test "$1 ($2) bond recovery"
+ lag_reset_network
+ return 0
+}
diff --git a/tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh b/tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh
new file mode 100755
index 000000000000..ad4c845a4ac7
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+# Regression Test:
+# When the bond is configured with down/updelay and the link state of
+# slave members flaps if there are no remaining members up the bond
+# should immediately select a member to bring up. (from bonding.txt
+# section 13.1 paragraph 4)
+#
+# +-------------+ +-----------+
+# | client | | switch |
+# | | | |
+# | +--------| link1 |-----+ |
+# | | +-------+ | |
+# | | | | | |
+# | | +-------+ | |
+# | | bond | link2 | Br0 | |
+# +-------------+ +-----------+
+# 172.20.2.1 172.20.2.2
+
+
+REQUIRE_MZ=no
+REQUIRE_JQ=no
+NUM_NETIFS=0
+lib_dir=$(dirname "$0")
+source "$lib_dir"/net_forwarding_lib.sh
+source "$lib_dir"/lag_lib.sh
+
+cleanup()
+{
+ lag_cleanup
+}
+
+trap cleanup 0 1 2
+
+lag_setup_network
+test_bond_recovery mode 1 miimon 100 updelay 0
+test_bond_recovery mode 1 miimon 100 updelay 200
+test_bond_recovery mode 1 miimon 100 updelay 500
+test_bond_recovery mode 1 miimon 100 updelay 1000
+test_bond_recovery mode 1 miimon 100 updelay 2000
+test_bond_recovery mode 1 miimon 100 updelay 5000
+test_bond_recovery mode 1 miimon 100 updelay 10000
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh b/tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh
new file mode 100755
index 000000000000..2330d37453f9
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+# Regression Test:
+# When the bond is configured with down/updelay and the link state of
+# slave members flaps if there are no remaining members up the bond
+# should immediately select a member to bring up. (from bonding.txt
+# section 13.1 paragraph 4)
+#
+# +-------------+ +-----------+
+# | client | | switch |
+# | | | |
+# | +--------| link1 |-----+ |
+# | | +-------+ | |
+# | | | | | |
+# | | +-------+ | |
+# | | bond | link2 | Br0 | |
+# +-------------+ +-----------+
+# 172.20.2.1 172.20.2.2
+
+
+REQUIRE_MZ=no
+REQUIRE_JQ=no
+NUM_NETIFS=0
+lib_dir=$(dirname "$0")
+source "$lib_dir"/net_forwarding_lib.sh
+source "$lib_dir"/lag_lib.sh
+
+cleanup()
+{
+ lag_cleanup
+}
+
+trap cleanup 0 1 2
+
+lag_setup_network
+test_bond_recovery mode 2 miimon 100 updelay 0
+test_bond_recovery mode 2 miimon 100 updelay 200
+test_bond_recovery mode 2 miimon 100 updelay 500
+test_bond_recovery mode 2 miimon 100 updelay 1000
+test_bond_recovery mode 2 miimon 100 updelay 2000
+test_bond_recovery mode 2 miimon 100 updelay 5000
+test_bond_recovery mode 2 miimon 100 updelay 10000
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/drivers/net/bonding/settings b/tools/testing/selftests/drivers/net/bonding/settings
index 867e118223cd..6091b45d226b 100644
--- a/tools/testing/selftests/drivers/net/bonding/settings
+++ b/tools/testing/selftests/drivers/net/bonding/settings
@@ -1 +1 @@
-timeout=60
+timeout=120
--
2.31.1
When --raw_output is set (to any value), we don't actually parse the
test results. So asking to print the test results as json doesn't make
sense.
We internally create a fake test with one passing subtest, so --json
would actually print out something misleading.
This patch:
* Rewords the flag descriptions so hopefully this is more obvious.
* Also updates --raw_output's description to note the default behavior
is to print out only "KUnit" results (actually any KTAP results)
* also renames and refactors some related logic for clarity (e.g.
test_result => test, it's a kunit_parser.Test object).
Notably, this patch does not make it an error to specify --json and
--raw_output together. This is an edge case, but I know of at least one
wrapper around kunit.py that always sets --json. You'd never be able to
use --raw_output with that wrapper.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
tools/testing/kunit/kunit.py | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 4d4663fb578b..e7b6549712d6 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -192,12 +192,11 @@ def _map_to_overall_status(test_status: kunit_parser.TestStatus) -> KunitStatus:
def parse_tests(request: KunitParseRequest, metadata: kunit_json.Metadata, input_data: Iterable[str]) -> Tuple[KunitResult, kunit_parser.Test]:
parse_start = time.time()
- test_result = kunit_parser.Test()
-
if request.raw_output:
# Treat unparsed results as one passing test.
- test_result.status = kunit_parser.TestStatus.SUCCESS
- test_result.counts.passed = 1
+ fake_test = kunit_parser.Test()
+ fake_test.status = kunit_parser.TestStatus.SUCCESS
+ fake_test.counts.passed = 1
output: Iterable[str] = input_data
if request.raw_output == 'all':
@@ -206,14 +205,17 @@ def parse_tests(request: KunitParseRequest, metadata: kunit_json.Metadata, input
output = kunit_parser.extract_tap_lines(output, lstrip=False)
for line in output:
print(line.rstrip())
+ parse_time = time.time() - parse_start
+ return KunitResult(KunitStatus.SUCCESS, parse_time), fake_test
- else:
- test_result = kunit_parser.parse_run_tests(input_data)
- parse_end = time.time()
+
+ # Actually parse the test results.
+ test = kunit_parser.parse_run_tests(input_data)
+ parse_time = time.time() - parse_start
if request.json:
json_str = kunit_json.get_json_result(
- test=test_result,
+ test=test,
metadata=metadata)
if request.json == 'stdout':
print(json_str)
@@ -223,10 +225,10 @@ def parse_tests(request: KunitParseRequest, metadata: kunit_json.Metadata, input
stdout.print_with_timestamp("Test results stored in %s" %
os.path.abspath(request.json))
- if test_result.status != kunit_parser.TestStatus.SUCCESS:
- return KunitResult(KunitStatus.TEST_FAILURE, parse_end - parse_start), test_result
+ if test.status != kunit_parser.TestStatus.SUCCESS:
+ return KunitResult(KunitStatus.TEST_FAILURE, parse_time), test
- return KunitResult(KunitStatus.SUCCESS, parse_end - parse_start), test_result
+ return KunitResult(KunitStatus.SUCCESS, parse_time), test
def run_tests(linux: kunit_kernel.LinuxSourceTree,
request: KunitRequest) -> KunitResult:
@@ -359,14 +361,14 @@ def add_exec_opts(parser) -> None:
choices=['suite', 'test'])
def add_parse_opts(parser) -> None:
- parser.add_argument('--raw_output', help='If set don\'t format output from kernel. '
- 'If set to --raw_output=kunit, filters to just KUnit output.',
+ parser.add_argument('--raw_output', help='If set don\'t parse output from kernel. '
+ 'By default, filters to just KUnit output. Use '
+ '--raw_output=all to show everything',
type=str, nargs='?', const='all', default=None, choices=['all', 'kunit'])
parser.add_argument('--json',
nargs='?',
- help='Stores test results in a JSON, and either '
- 'prints to stdout or saves to file if a '
- 'filename is specified',
+ help='Prints parsed test results as JSON to stdout or a file if '
+ 'a filename is specified. Does nothing if --raw_output is set.',
type=str, const='stdout', default=None, metavar='FILE')
base-commit: 870f63b7cd78d0055902d839a60408f7428b4e84
--
2.38.1.584.g0f3c55d4c2-goog
This series is a follow-up to [0]. patch 1 updates sk_storage_map_test to
ensure special map value fields are not copied between user and kernel.
patch 0 fixes a bug found by the updated test.
[0] https://lore.kernel.org/bpf/1ca2e4e8-ed7e-9174-01f6-c14539b8b8b2@huawei.com/
Xu Kuohai (2):
bpf: Do not copy spin lock field from user in bpf_selem_alloc
bpf: Set and check spin lock value in sk_storage_map_test
kernel/bpf/bpf_local_storage.c | 2 +-
.../selftests/bpf/map_tests/sk_storage_map.c | 36 ++++++++++---------
2 files changed, 21 insertions(+), 17 deletions(-)
--
2.30.2