Clean up the KVM clock mess somewhat so that it is either based on the guest
TSC ("master clock" mode), or on the host CLOCK_MONOTONIC_RAW in cases where
the TSC isn't usable.
Eliminate the third variant where it was based directly on the *host* TSC,
due to bugs in e.g. __get_kvmclock().
Kill off the last vestiges of the KVM clock being based on CLOCK_MONOTONIC
instead of CLOCK_MONOTONIC_RAW and thus being subject to NTP skew.
Fix up migration support to allow the KVM clock to be saved/…
[View More]restored as an
arithmetic function of the guest TSC, since that's what it actually is in
the *common* case so it can be migrated precisely. Or at least to within
±1 ns which is good enough, as discussed in
https://lore.kernel.org/kvm/c8dca08bf848e663f192de6705bf04aa3966e856.camel@…
In v2 of this series, TSC synchronization is improved and simplified a bit
too, and we allow masterclock mode to be used even when the guest TSCs are
out of sync, as long as they're running at the same *rate*. The different
*offset* shouldn't matter.
And the kvm_get_time_scale() function annoyed me by being entirely opaque,
so I studied it until my brain hurt and then added some comments.
In v2 I also dropped the commits which were removing the periodic clock
syncs. Those are going to be needed still but *only* for non-masterclock
mode, which I'll do next. Along with ensuring that a masterclock update
while already in masterclock mode doesn't jump the clock, and just does
the same as KVM_SET_CLOCK_GUEST does to preserve it.
Needs a *lot* more testing. I think I'm almost done refactoring the code,
so should focus on building up the tests next.
(I do still hate that we're abusing KVM_GET_CLOCK just to get the tuple
of {host_tsc, CLOCK_REALTIME} without even *caring* about the eponymous
KVM clock. Especially as this information is (a) fundamentally what the
vDSO gettimeofday() exposes to us anyway, (b) using CLOCK_REALTIME not
TAI, (c) not available on other platforms, for example for migrating
the Arm arch counter.)
David Woodhouse (13):
KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force
KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC
KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration
KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host
KVM: x86: Fix KVM clock precision in __get_kvmclock()
KVM: x86: Fix software TSC upscaling in kvm_update_guest_time()
KVM: x86: Simplify and comment kvm_get_time_scale()
KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset()
KVM: x86: Improve synchronization in kvm_synchronize_tsc()
KVM: x86: Kill cur_tsc_{nsec,offset,write} fields
KVM: x86: Allow KVM master clock mode when TSCs are offset from each other
KVM: x86: Factor out kvm_use_master_clock()
Jack Allister (2):
KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
KVM: selftests: Add KVM/PV clock selftest to prove timer correction
Documentation/virt/kvm/api.rst | 37 ++
Documentation/virt/kvm/devices/vcpu.rst | 115 +++-
arch/x86/include/asm/kvm_host.h | 15 +-
arch/x86/include/uapi/asm/kvm.h | 6 +
arch/x86/kvm/svm/svm.c | 3 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 687 +++++++++++++++-------
arch/x86/kvm/xen.c | 4 +-
include/uapi/linux/kvm.h | 3 +
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/x86_64/pvclock_test.c | 192 ++++++
11 files changed, 822 insertions(+), 243 deletions(-)
[View Less]
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland <mark.rutland(a)arm.com>
[ Upstream commit 8ecab2e64572f1aecdfc5a8feae748abda6e3347 ]
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common …
[View More]caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae30..3f74c09c56b62 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.43.0
[View Less]
6.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland <mark.rutland(a)arm.com>
[ Upstream commit 8ecab2e64572f1aecdfc5a8feae748abda6e3347 ]
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common …
[View More]caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae30..3f74c09c56b62 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.43.0
[View Less]
After this change the single SAN device (ns3eth1) is now replaced with
two SAN devices - respectively ns4eth1 and ns5eth1.
It is possible to extend this script to have more SAN devices connected
by adding them to ns3br1 bridge.
Signed-off-by: Lukasz Majewski <lukma(a)denx.de>
---
tools/testing/selftests/net/hsr/hsr_redbox.sh | 71 +++++++++++++------
1 file changed, 49 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/net/hsr/hsr_redbox.sh b/tools/testing/selftests/…
[View More]net/hsr/hsr_redbox.sh
index db69be95ecb3..1f36785347c0 100755
--- a/tools/testing/selftests/net/hsr/hsr_redbox.sh
+++ b/tools/testing/selftests/net/hsr/hsr_redbox.sh
@@ -8,12 +8,19 @@ source ./hsr_common.sh
do_complete_ping_test()
{
echo "INFO: Initial validation ping (HSR-SAN/RedBox)."
- # Each node has to be able each one.
+ # Each node has to be able to reach each one.
do_ping "${ns1}" 100.64.0.2
do_ping "${ns2}" 100.64.0.1
- # Ping from SAN to hsr1 (via hsr2)
+ # Ping between SANs (test bridge)
+ do_ping "${ns4}" 100.64.0.51
+ do_ping "${ns5}" 100.64.0.41
+ # Ping from SANs to hsr1 (via hsr2) (and opposite)
do_ping "${ns3}" 100.64.0.1
do_ping "${ns1}" 100.64.0.3
+ do_ping "${ns1}" 100.64.0.41
+ do_ping "${ns4}" 100.64.0.1
+ do_ping "${ns1}" 100.64.0.51
+ do_ping "${ns5}" 100.64.0.1
stop_if_error "Initial validation failed."
# Wait for MGNT HSR frames being received and nodes being
@@ -23,8 +30,12 @@ do_complete_ping_test()
echo "INFO: Longer ping test (HSR-SAN/RedBox)."
# Ping from SAN to hsr1 (via hsr2)
do_ping_long "${ns3}" 100.64.0.1
- # Ping from hsr1 (via hsr2) to SAN
+ # Ping from hsr1 (via hsr2) to SANs (and opposite)
do_ping_long "${ns1}" 100.64.0.3
+ do_ping_long "${ns1}" 100.64.0.41
+ do_ping_long "${ns4}" 100.64.0.1
+ do_ping_long "${ns1}" 100.64.0.51
+ do_ping_long "${ns5}" 100.64.0.1
stop_if_error "Longer ping test failed."
echo "INFO: All good."
@@ -35,22 +46,26 @@ setup_hsr_interfaces()
local HSRv="$1"
echo "INFO: preparing interfaces for HSRv${HSRv} (HSR-SAN/RedBox)."
-
-# |NS1 |
-# | |
-# | /-- hsr1 --\ |
-# | ns1eth1 ns1eth2 |
-# |------------------------|
-# | |
-# | |
-# | |
-# |------------------------| |-----------|
-# | ns2eth1 ns2eth2 | | |
-# | \-- hsr2 --/ | | |
-# | \ | | |
-# | ns2eth3 |--------| ns3eth1 |
-# | (interlink)| | |
-# |NS2 (RedBOX) | |NS3 (SAN) |
+#
+# IPv4 addresses (100.64.X.Y/24), and [X.Y] is presented on below diagram:
+#
+#
+# |NS1 | |NS4 |
+# | [0.1] | | |
+# | /-- hsr1 --\ | | [0.41] |
+# | ns1eth1 ns1eth2 | | ns4eth1 (SAN) |
+# |------------------------| |-------------------|
+# | | |
+# | | |
+# | | |
+# |------------------------| |-------------------------------|
+# | ns2eth1 ns2eth2 | | ns3eth2 |
+# | \-- hsr2 --/ | | / |
+# | [0.2] \ | | / | |------------|
+# | ns2eth3 |---| ns3eth1 -- ns3br1 -- ns3eth3--|--| ns5eth1 |
+# | (interlink)| | [0.3] [0.11] | | [0.51] |
+# |NS2 (RedBOX) | |NS3 (BR) | | NS5 (SAN) |
+#
#
# Check if iproute2 supports adding interlink port to hsrX device
ip link help hsr | grep -q INTERLINK
@@ -59,7 +74,9 @@ setup_hsr_interfaces()
# Create interfaces for name spaces
ip link add ns1eth1 netns "${ns1}" type veth peer name ns2eth1 netns "${ns2}"
ip link add ns1eth2 netns "${ns1}" type veth peer name ns2eth2 netns "${ns2}"
- ip link add ns3eth1 netns "${ns3}" type veth peer name ns2eth3 netns "${ns2}"
+ ip link add ns2eth3 netns "${ns2}" type veth peer name ns3eth1 netns "${ns3}"
+ ip link add ns3eth2 netns "${ns3}" type veth peer name ns4eth1 netns "${ns4}"
+ ip link add ns3eth3 netns "${ns3}" type veth peer name ns5eth1 netns "${ns5}"
sleep 1
@@ -70,21 +87,31 @@ setup_hsr_interfaces()
ip -n "${ns2}" link set ns2eth2 up
ip -n "${ns2}" link set ns2eth3 up
- ip -n "${ns3}" link set ns3eth1 up
+ ip -n "${ns3}" link add name ns3br1 type bridge
+ ip -n "${ns3}" link set ns3br1 up
+ ip -n "${ns3}" link set ns3eth1 master ns3br1 up
+ ip -n "${ns3}" link set ns3eth2 master ns3br1 up
+ ip -n "${ns3}" link set ns3eth3 master ns3br1 up
+
+ ip -n "${ns4}" link set ns4eth1 up
+ ip -n "${ns5}" link set ns5eth1 up
ip -net "${ns1}" link add name hsr1 type hsr slave1 ns1eth1 slave2 ns1eth2 supervision 45 version ${HSRv} proto 0
ip -net "${ns2}" link add name hsr2 type hsr slave1 ns2eth1 slave2 ns2eth2 interlink ns2eth3 supervision 45 version ${HSRv} proto 0
ip -n "${ns1}" addr add 100.64.0.1/24 dev hsr1
ip -n "${ns2}" addr add 100.64.0.2/24 dev hsr2
+ ip -n "${ns3}" addr add 100.64.0.11/24 dev ns3br1
ip -n "${ns3}" addr add 100.64.0.3/24 dev ns3eth1
+ ip -n "${ns4}" addr add 100.64.0.41/24 dev ns4eth1
+ ip -n "${ns5}" addr add 100.64.0.51/24 dev ns5eth1
ip -n "${ns1}" link set hsr1 up
ip -n "${ns2}" link set hsr2 up
}
check_prerequisites
-setup_ns ns1 ns2 ns3
+setup_ns ns1 ns2 ns3 ns4 ns5
trap cleanup_all_ns EXIT
--
2.20.1
[View Less]
Joachim kindly merged the IPv6 support in
https://github.com/troglobit/mtools/pull/2, so we can just use his
version now. A few more fixes subsequently came in for IPv6, so even
better.
Check that the deployed mtools version is 3.0 or above. Note that the
version check breaks compatibility with my fork where I didn't bump the
version, but I assume that won't be a problem.
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
tools/testing/selftests/net/forwarding/lib.sh | 19 ++…
[View More]+++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 4fe28ab5d8b9..aa925c0954a5 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -309,6 +309,21 @@ require_command()
fi
}
+# IPv6 support was added in v3.0
+check_mtools_version()
+{
+ local version="$(msend -v)"
+ local major
+
+ version=${version##msend version }
+ major=$(echo $version | cut -d. -f1)
+
+ if [ $major -lt 3 ]; then
+ echo "SKIP: expected mtools version 3.0, got $version"
+ exit $ksft_skip
+ fi
+}
+
if [[ "$REQUIRE_JQ" = "yes" ]]; then
require_command jq
fi
@@ -316,10 +331,10 @@ if [[ "$REQUIRE_MZ" = "yes" ]]; then
require_command $MZ
fi
if [[ "$REQUIRE_MTOOLS" = "yes" ]]; then
- # https://github.com/vladimiroltean/mtools/
- # patched for IPv6 support
+ # https://github.com/troglobit/mtools
require_command msend
require_command mreceive
+ check_mtools_version
fi
##############################################################################
--
2.34.1
[View Less]
Hi Linus,
Without reply from Shuah, and given the importance of these fixes [1], here is
a PR to fix Kselftest (broken since v6.9-rc1) for at least KVM, pidfd, and
Landlock. I cannot test against all kselftests though. This has been in
linux-next since the beginning of this week, and so far only one issue has been
reported [2] and fixed [3].
Feel free to take this PR if you see fit.
Regards,
Mickaël
[1] https://lore.kernel.org/r/Zjo1xyhjmehsRhZ2@google.com
[2] https://lore.kernel.org/r/…
[View More]202405100339.vfBe0t9C-lkp@intel.com
[3] https://lore.kernel.org/r/20240511171445.904356-1-mic@digikod.net
--
The following changes since commit e67572cd2204894179d89bd7b984072f19313b03:
Linux 6.9-rc6 (2024-04-28 13:47:24 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git tags/kselftest-fix-vfork-2024-05-12
for you to fetch changes up to 323feb3bdb67649bfa5614eb24ec9cb92a60cf33:
selftests/harness: Handle TEST_F()'s explicit exit codes (2024-05-11 19:18:47 +0200)
----------------------------------------------------------------
Fix Kselftest's vfork() side effects
See https://lore.kernel.org/r/20240511171445.904356-1-mic@digikod.net
----------------------------------------------------------------
Mickaël Salaün (10):
selftests/pidfd: Fix config for pidfd_setns_test
selftests/landlock: Fix FS tests when run on a private mount point
selftests/harness: Fix fixture teardown
selftests/harness: Fix interleaved scheduling leading to race conditions
selftests/landlock: Do not allocate memory in fixture data
selftests/harness: Constify fixture variants
selftests/pidfd: Fix wrong expectation
selftests/harness: Share _metadata between forked processes
selftests/harness: Fix vfork() side effects
selftests/harness: Handle TEST_F()'s explicit exit codes
tools/testing/selftests/kselftest_harness.h | 127 +++++++++++++++++------
tools/testing/selftests/landlock/fs_test.c | 83 +++++++++------
tools/testing/selftests/pidfd/config | 2 +
tools/testing/selftests/pidfd/pidfd_setns_test.c | 2 +-
4 files changed, 147 insertions(+), 67 deletions(-)
[View Less]
Currently, if at runtime we are not able to allocate a huge page, the
test will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo.
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
tools/testing/selftests/mm/compaction_test.c | 6 +++++-
1 file changed, 5 insertions(+), 1 …
[View More]deletion(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 533999b6c284..df1b76f9c734 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -134,6 +134,10 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
+ if (!atoi(nr_hugepages)) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -149,7 +153,7 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
atoi(nr_hugepages));
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
--
2.39.2
[View Less]