In order to use run_kselftest.sh the list of tests must be emitted to
populate kselftest-list.txt.
The powerpc Makefile is written to use EMIT_TESTS. But support for
EMIT_TESTS was dropped in commit d4e59a536f50 ("selftests: Use runner.sh
for emit targets"). Although prior to that commit a548de0fe8e1
("selftests: lib.mk: add test execute bit check to EMIT_TESTS") had
already broken run_kselftest.sh for powerpc due to the executable check
using the wrong path.
It can be fixed by replacing the EMIT_TESTS definitions with actual
emit_tests rules in the powerpc Makefiles. This makes run_kselftest.sh
able to run powerpc tests:
$ cd linux
$ export ARCH=powerpc
$ export CROSS_COMPILE=powerpc64le-linux-gnu-
$ make headers
$ make -j -C tools/testing/selftests install
$ grep -c "^powerpc" tools/testing/selftests/kselftest_install/kselftest-list.txt
182
Fixes: d4e59a536f50 ("selftests: Use runner.sh for emit targets")
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/powerpc/Makefile | 7 +++----
tools/testing/selftests/powerpc/pmu/Makefile | 11 ++++++-----
2 files changed, 9 insertions(+), 9 deletions(-)
I'll plan to merge this via the powerpc tree.
cheers
diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile
index 49f2ad1793fd..7ea42fa02eab 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -59,12 +59,11 @@ override define INSTALL_RULE
done;
endef
-override define EMIT_TESTS
+emit_tests:
+@for TARGET in $(SUB_DIRS); do \
BUILD_TARGET=$(OUTPUT)/$$TARGET; \
- $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests;\
+ $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET $@;\
done;
-endef
override define CLEAN
+@for TARGET in $(SUB_DIRS); do \
@@ -77,4 +76,4 @@ endef
tags:
find . -name '*.c' -o -name '*.h' | xargs ctags
-.PHONY: tags $(SUB_DIRS)
+.PHONY: tags $(SUB_DIRS) emit_tests
diff --git a/tools/testing/selftests/powerpc/pmu/Makefile b/tools/testing/selftests/powerpc/pmu/Makefile
index 2b95e44d20ff..a284fa874a9f 100644
--- a/tools/testing/selftests/powerpc/pmu/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/Makefile
@@ -30,13 +30,14 @@ override define RUN_TESTS
+TARGET=event_code_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests
endef
-DEFAULT_EMIT_TESTS := $(EMIT_TESTS)
-override define EMIT_TESTS
- $(DEFAULT_EMIT_TESTS)
+emit_tests:
+ for TEST in $(TEST_GEN_PROGS); do \
+ BASENAME_TEST=`basename $$TEST`; \
+ echo "$(COLLECTION):$$BASENAME_TEST"; \
+ done
+TARGET=ebb; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests
+TARGET=sampling_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests
+TARGET=event_code_tests; BUILD_TARGET=$$OUTPUT/$$TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -s -C $$TARGET emit_tests
-endef
DEFAULT_INSTALL_RULE := $(INSTALL_RULE)
override define INSTALL_RULE
@@ -64,4 +65,4 @@ sampling_tests:
event_code_tests:
TARGET=$@; BUILD_TARGET=$$OUTPUT/$$TARGET; mkdir -p $$BUILD_TARGET; $(MAKE) OUTPUT=$$BUILD_TARGET -k -C $$TARGET all
-.PHONY: all run_tests ebb sampling_tests event_code_tests
+.PHONY: all run_tests ebb sampling_tests event_code_tests emit_tests
--
2.41.0
Rework how KVM limits guest-unsupported xfeatures to effectively hide
only when saving state for userspace (KVM_GET_XSAVE), i.e. to let userspace
load all host-supported xfeatures (via KVM_SET_XSAVE) irrespective of
what features have been exposed to the guest.
The effect on KVM_SET_XSAVE was knowingly done by commit ad856280ddea
("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0"):
As a bonus, it will also fail if userspace tries to set fpu features
(with the KVM_SET_XSAVE ioctl) that are not compatible to the guest
configuration. Such features will never be returned by KVM_GET_XSAVE
or KVM_GET_XSAVE2.
Peventing userspace from doing stupid things is usually a good idea, but in
this case restricting KVM_SET_XSAVE actually exacerbated the problem that
commit ad856280ddea was fixing. As reported by Tyler, rejecting KVM_SET_XSAVE
for guest-unsupported xfeatures breaks live migration from a kernel without
commit ad856280ddea, to a kernel with ad856280ddea. I.e. from a kernel that
saves guest-unsupported xfeatures to a kernel that doesn't allow loading
guest-unuspported xfeatures.
To make matters even worse, QEMU doesn't terminate if KVM_SET_XSAVE fails,
and so the end result is that the live migration results (possibly silent)
guest data corruption instead of a failed migration.
Patch 1 refactors the FPU code to let KVM pass in a mask of which xfeatures
to save, patch 2 fixes KVM by passing in guest_supported_xcr0 instead of
modifying user_xfeatures directly.
Patches 3-5 are regression tests.
I have no objection if anyone wants patches 1 and 2 squashed together, I
split them purely to make review easier.
Note, this doesn't fix the scenario where a guest is migrated from a "bad"
to a "good" kernel and the target host doesn't support the over-saved set
of xfeatures. I don't see a way to safely handle that in the kernel without
an opt-in, which more or less defeats the purpose of handling it in KVM.
Sean Christopherson (5):
x86/fpu: Allow caller to constrain xfeatures when copying to uabi
buffer
KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}
KVM: selftests: Touch relevant XSAVE state in guest for state test
KVM: selftests: Load XSAVE state into untouched vCPU during state test
KVM: selftests: Force load all supported XSAVE state in state test
arch/x86/include/asm/fpu/api.h | 3 +-
arch/x86/kernel/fpu/core.c | 5 +-
arch/x86/kernel/fpu/xstate.c | 12 +-
arch/x86/kernel/fpu/xstate.h | 3 +-
arch/x86/kvm/cpuid.c | 8 --
arch/x86/kvm/x86.c | 37 +++---
.../selftests/kvm/include/x86_64/processor.h | 23 ++++
.../testing/selftests/kvm/x86_64/state_test.c | 110 +++++++++++++++++-
8 files changed, 168 insertions(+), 33 deletions(-)
base-commit: 5804c19b80bf625c6a9925317f845e497434d6d3
--
2.42.0.582.g8ccd20d70d-goog
Changelog: v2 -> v3
* Minimal code refactoring
* Rebased on v6.6-rc1
RFC v1:
https://lore.kernel.org/all/20210611124154.56427-1-psampat@linux.ibm.com/
RFC v2:
https://lore.kernel.org/all/20230828061530.126588-2-aboorvad@linux.vnet.ibm…
Other related RFC:
https://lore.kernel.org/all/20210430082804.38018-1-psampat@linux.ibm.com/
Userspace selftest:
https://lkml.org/lkml/2020/9/2/356
----
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will be additional latencies
of the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/powerpc/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output is as follows:
# --IPI Latency Test---
# Baseline Avg IPI latency(ns): 2720
# Observed Avg IPI latency(ns) - State snooze: 2565
# Observed Avg IPI latency(ns) - State stop0_lite: 3856
# Observed Avg IPI latency(ns) - State stop0: 3670
# Observed Avg IPI latency(ns) - State stop1: 3872
# Observed Avg IPI latency(ns) - State stop2: 17421
# Observed Avg IPI latency(ns) - State stop4: 1003922
# Observed Avg IPI latency(ns) - State stop5: 1058870
#
# --Timeout Latency Test--
# Baseline Avg timeout diff(ns): 1435
# Observed Avg timeout diff(ns) - State snooze: 1709
# Observed Avg timeout diff(ns) - State stop0_lite: 2028
# Observed Avg timeout diff(ns) - State stop0: 1954
# Observed Avg timeout diff(ns) - State stop1: 1895
# Observed Avg timeout diff(ns) - State stop2: 14556
# Observed Avg timeout diff(ns) - State stop4: 873988
# Observed Avg timeout diff(ns) - State stop5: 959137
Aboorva Devarajan (2):
powerpc/cpuidle: cpuidle wakeup latency based on IPI and timer events
powerpc/selftest: Add support for cpuidle latency measurement
arch/powerpc/Kconfig.debug | 10 +
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/kernel/test_cpuidle_latency.c | 154 ++++++
tools/testing/selftests/powerpc/Makefile | 1 +
.../powerpc/cpuidle_latency/.gitignore | 2 +
.../powerpc/cpuidle_latency/Makefile | 6 +
.../cpuidle_latency/cpuidle_latency.sh | 443 ++++++++++++++++++
.../powerpc/cpuidle_latency/settings | 1 +
8 files changed, 618 insertions(+)
create mode 100644 arch/powerpc/kernel/test_cpuidle_latency.c
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/Makefile
create mode 100755 tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/settings
--
2.25.1
When execute the following command to test clone3 under !CONFIG_TIME_NS:
# make headers && cd tools/testing/selftests/clone3 && make && ./clone3
we can see the following error info:
# [7538] Trying clone3() with flags 0x80 (size 0)
# Invalid argument - Failed to create new process
# [7538] clone3() with flags says: -22 expected 0
not ok 18 [7538] Result (-22) is different than expected (0)
...
# Totals: pass:18 fail:1 xfail:0 xpass:0 skip:0 error:0
This is because if CONFIG_TIME_NS is not set, but the flag
CLONE_NEWTIME (0x80) is used to clone a time namespace, it
will return -EINVAL in copy_time_ns().
If kernel does not support CONFIG_TIME_NS, /proc/self/ns/time
will be not exist, and then we should skip clone3() test with
CLONE_NEWTIME.
With this patch under !CONFIG_TIME_NS:
# make headers && cd tools/testing/selftests/clone3 && make && ./clone3
...
# Time namespaces are not supported
ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
...
# Totals: pass:18 fail:0 xfail:0 xpass:0 skip:1 error:0
Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Suggested-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
---
v6: Rebase on 6.5-rc1 and update the commit message
tools/testing/selftests/clone3/clone3.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index e60cf4d..1c61e3c 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -196,7 +196,12 @@ int main(int argc, char *argv[])
CLONE3_ARGS_NO_TEST);
/* Do a clone3() in a new time namespace */
- test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
+ if (access("/proc/self/ns/time", F_OK) == 0) {
+ test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
+ } else {
+ ksft_print_msg("Time namespaces are not supported\n");
+ ksft_test_result_skip("Skipping clone3() with CLONE_NEWTIME\n");
+ }
/* Do a clone3() with exit signal (SIGCHLD) in flags */
test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST);
--
2.1.0
v2: Rebase on 6.5-rc1 and update the commit message
Tiezhu Yang (2):
selftests/vDSO: Add support for LoongArch
selftests/vDSO: Get version and name for all archs
tools/testing/selftests/vDSO/vdso_config.h | 6 ++++-
tools/testing/selftests/vDSO/vdso_test_getcpu.c | 16 +++++--------
.../selftests/vDSO/vdso_test_gettimeofday.c | 26 ++++++----------------
3 files changed, 18 insertions(+), 30 deletions(-)
--
2.1.0
Hi all,
This series implements the Permission Overlay Extension introduced in 2022
VMSA enhancements [1]. It is based on v6.6-rc3.
The Permission Overlay Extension allows to constrain permissions on memory
regions. This can be used from userspace (EL0) without a system call or TLB
invalidation.
POE is used to implement the Memory Protection Keys [2] Linux syscall.
The first few patches add the basic framework, then the PKEYS interface is
implemented, and then the selftests are made to work on arm64.
There was discussion about what the 'default' protection key value should be,
I used disallow-all (apart from pkey 0), which matches what x86 does.
Patch 15 contains a call to cpus_have_const_cap(), which I couldn't avoid
until Mark's patch to re-order when the alternatives were applied [3] is
committed.
The KVM part isn't tested yet.
I have tested the modified protection_keys test on x86_64 [4], but not PPC.
Hopefully I have CC'd everyone correctly.
Thanks,
Joey
Joey Gouly (20):
arm64/sysreg: add system register POR_EL{0,1}
arm64/sysreg: update CPACR_EL1 register
arm64: cpufeature: add Permission Overlay Extension cpucap
arm64: disable trapping of POR_EL0 to EL2
arm64: context switch POR_EL0 register
KVM: arm64: Save/restore POE registers
arm64: enable the Permission Overlay Extension for EL0
arm64: add POIndex defines
arm64: define VM_PKEY_BIT* for arm64
arm64: mask out POIndex when modifying a PTE
arm64: enable ARCH_HAS_PKEYS on arm64
arm64: handle PKEY/POE faults
arm64: stop using generic mm_hooks.h
arm64: implement PKEYS support
arm64: add POE signal support
arm64: enable PKEY support for CPUs with S1POE
arm64: enable POE and PIE to coexist
kselftest/arm64: move get_header()
selftests: mm: move fpregs printing
selftests: mm: make protection_keys test work on arm64
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/el2_setup.h | 10 +-
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/mman.h | 6 +-
arch/arm64/include/asm/mmu.h | 2 +
arch/arm64/include/asm/mmu_context.h | 51 ++++++-
arch/arm64/include/asm/pgtable-hwdef.h | 10 ++
arch/arm64/include/asm/pgtable-prot.h | 8 +-
arch/arm64/include/asm/pgtable.h | 28 +++-
arch/arm64/include/asm/pkeys.h | 110 ++++++++++++++
arch/arm64/include/asm/por.h | 33 +++++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/include/asm/sysreg.h | 16 ++
arch/arm64/include/asm/traps.h | 1 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/sigcontext.h | 7 +
arch/arm64/kernel/cpufeature.c | 15 ++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/process.c | 16 ++
arch/arm64/kernel/signal.c | 51 +++++++
arch/arm64/kernel/traps.c | 12 +-
arch/arm64/kvm/sys_regs.c | 2 +
arch/arm64/mm/fault.c | 44 +++++-
arch/arm64/mm/mmap.c | 7 +
arch/arm64/mm/mmu.c | 38 +++++
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 11 +-
fs/proc/task_mmu.c | 2 +
include/linux/mm.h | 11 +-
.../arm64/signal/testcases/testcases.c | 23 ---
.../arm64/signal/testcases/testcases.h | 26 +++-
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/pkey-arm64.h | 138 ++++++++++++++++++
tools/testing/selftests/mm/pkey-helpers.h | 8 +
tools/testing/selftests/mm/pkey-powerpc.h | 3 +
tools/testing/selftests/mm/pkey-x86.h | 4 +
tools/testing/selftests/mm/protection_keys.c | 29 ++--
39 files changed, 685 insertions(+), 52 deletions(-)
create mode 100644 arch/arm64/include/asm/pkeys.h
create mode 100644 arch/arm64/include/asm/por.h
create mode 100644 tools/testing/selftests/mm/pkey-arm64.h
--
2.25.1
Zero out the buffer for readlink() since readlink() does not append a
terminating null byte to the buffer.
Fixes: 833c12ce0f430 ("selftests/x86/lam: Add inherit test cases for linear-address masking")
Signed-off-by: Binbin Wu <binbin.wu(a)linux.intel.com>
---
tools/testing/selftests/x86/lam.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index eb0e46905bf9..9f06942a8e25 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -680,7 +680,7 @@ static int handle_execve(struct testcases *test)
perror("Fork failed.");
ret = 1;
} else if (pid == 0) {
- char path[PATH_MAX];
+ char path[PATH_MAX] = {0};
/* Set LAM mode in parent process */
if (set_lam(lam) != 0)
base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70
--
2.25.1
PASID (Process Address Space ID) is a PCIe extension to tag the DMA
transactions out of a physical device, and most modern IOMMU hardware
have supported PASID granular address translation. So a PASID-capable
devices can be attached to multiple hwpts (a.k.a. domains), each attachment
is tagged with a PASID.
This series first adds a missing iommu API to replace domain for a pasid,
then adds iommufd APIs for device drivers to attach/replace/detach pasid
to/from hwpt per userspace's request, and adds selftest to validate the
iommufd APIs.
pasid attach/replace is mandatory on Intel VT-d given the PASID table
locates in the physical address space hence must be managed by the kernel,
both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD
which allow configuring the PASID/CD table either in host physical address space
or nested on top of an GPA address space. This series only add VT-d support
as the minimal requirement.
Complete code can be found in below link:
https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
Regards,
Yi Liu
Kevin Tian (1):
iommufd: Support attach/replace hwpt per pasid
Lu Baolu (2):
iommu: Introduce a replace API for device pasid
iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (5):
iommufd: replace attach_fn with a structure
iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add test ops to test pasid attach/detach
iommufd/selftest: Add coverage for iommufd pasid attach/detach
drivers/iommu/intel/nested.c | 47 +++++
drivers/iommu/iommu-priv.h | 2 +
drivers/iommu/iommu.c | 73 ++++++--
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 42 +++--
drivers/iommu/iommufd/iommufd_private.h | 16 ++
drivers/iommu/iommufd/iommufd_test.h | 24 +++
drivers/iommu/iommufd/pasid.c | 152 ++++++++++++++++
drivers/iommu/iommufd/selftest.c | 158 ++++++++++++++--
include/linux/iommufd.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 172 ++++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 28 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++
13 files changed, 756 insertions(+), 43 deletions(-)
create mode 100644 drivers/iommu/iommufd/pasid.c
--
2.34.1
Context
=======
We've observed within Red Hat that isolated, NOHZ_FULL CPUs running a
pure-userspace application get regularly interrupted by IPIs sent from
housekeeping CPUs. Those IPIs are caused by activity on the housekeeping CPUs
leading to various on_each_cpu() calls, e.g.:
64359.052209596 NetworkManager 0 1405 smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush)
smp_call_function_many_cond+0x1
smp_call_function+0x39
on_each_cpu+0x2a
flush_tlb_kernel_range+0x7b
__purge_vmap_area_lazy+0x70
_vm_unmap_aliases.part.42+0xdf
change_page_attr_set_clr+0x16a
set_memory_ro+0x26
bpf_int_jit_compile+0x2f9
bpf_prog_select_runtime+0xc6
bpf_prepare_filter+0x523
sk_attach_filter+0x13
sock_setsockopt+0x92c
__sys_setsockopt+0x16a
__x64_sys_setsockopt+0x20
do_syscall_64+0x87
entry_SYSCALL_64_after_hwframe+0x65
The heart of this series is the thought that while we cannot remove NOHZ_FULL
CPUs from the list of CPUs targeted by these IPIs, they may not have to execute
the callbacks immediately. Anything that only affects kernelspace can wait
until the next user->kernel transition, providing it can be executed "early
enough" in the entry code.
The original implementation is from Peter [1]. Nicolas then added kernel TLB
invalidation deferral to that [2], and I picked it up from there.
Deferral approach
=================
Storing each and every callback, like a secondary call_single_queue turned out
to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in
userspace for as long as possible - no signal of any form would be sent when
deferring an IPI. This means that any form of queuing for deferred callbacks
would end up as a convoluted memory leak.
Deferred IPIs must thus be coalesced, which this series achieves by assigning
IPIs a "type" and having a mapping of IPI type to callback, leveraged upon
kernel entry.
What about IPIs whose callback take a parameter, you may ask?
Peter suggested during OSPM23 [3] that since on_each_cpu() targets
housekeeping CPUs *and* isolated CPUs, isolated CPUs can access either global or
housekeeping-CPU-local state to "reconstruct" the data that would have been sent
via the IPI.
This series does not affect any IPI callback that requires an argument, but the
approach would remain the same (one coalescable callback executed on kernel
entry).
Kernel entry vs execution of the deferred operation
===================================================
There is a non-zero length of code that is executed upon kernel entry before the
deferred operation can be itself executed (i.e. before we start getting into
context_tracking.c proper).
This means one must take extra care to what can happen in the early entry code,
and that <bad things> cannot happen. For instance, we really don't want to hit
instructions that have been modified by a remote text_poke() while we're on our
way to execute a deferred sync_core().
Patches
=======
o Patches 1-9 have been submitted separately and are included for the sake of
testing
o Patches 10-14 focus on having objtool detect problematic static key usage in
early entry
o Patch 15 adds the infrastructure for IPI deferral.
o Patches 16-17 add some RCU testing infrastructure
o Patch 18 adds text_poke() IPI deferral.
o Patches 19-20 add vunmap() flush_tlb_kernel_range() IPI deferral
These ones I'm a lot less confident about, mostly due to lacking
instrumentation/verification.
The actual deferred callback is also incomplete as it's not properly noinstr:
vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x19: call to native_write_cr4() leaves .noinstr.text section
and it doesn't support PARAVIRT - it's going to need a pv_ops.mmu entry, but I
have *no idea* what a sane implementation would be for Xen so I haven't
touched that yet.
Patches are also available at:
https://gitlab.com/vschneid/linux.git -b redhat/isolirq/defer/v2
Testing
=======
Note: this is a different machine than used for v1, because that machine decided
to act difficult.
Xeon E5-2699 system with SMToff, NOHZ_FULL, isolated CPUs.
RHEL9 userspace.
Workload is using rteval (kernel compilation + hackbench) on housekeeping CPUs
and a dummy stay-in-userspace loop on the isolated CPUs. The main invocation is:
$ trace-cmd record -e "csd_queue_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \
-e "ipi_send_cpumask" -f "cpumask & CPUS{$ISOL_CPUS}" \
-e "ipi_send_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \
rteval --onlyload --loads-cpulist=$HK_CPUS \
--hackbench-runlowmem=True --duration=$DURATION
This only records IPIs sent to isolated CPUs, so any event there is interference
(with a bit of fuzz at the start/end of the workload when spawning the
processes). All tests were done with a duration of 30 minutes.
v6.5-rc1 (+ cpumask filtering patches):
# This is the actual IPI count
$ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr
338 callback=generic_smp_call_function_single_interrupt+0x0
# These are the different CSD's that caused IPIs
$ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr
9207 func=do_flush_tlb_all
1116 func=do_sync_core
62 func=do_kernel_range_flush
3 func=nohz_full_kick_func
v6.5-rc1 + patches:
# This is the actual IPI count
$ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr
2 callback=generic_smp_call_function_single_interrupt+0x0
# These are the different CSD's that caused IPIs
$ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr
2 func=nohz_full_kick_func
The incriminating IPIs are all gone, but note that on the machine I used to test
v1 there were still some do_flush_tlb_all() IPIs caused by
pcpu_balance_workfn(), since only vmalloc is affected by the deferral
mechanism.
Acknowledgements
================
Special thanks to:
o Clark Williams for listening to my ramblings about this and throwing ideas my way
o Josh Poimboeuf for his guidance regarding objtool and hinting at the
.data..ro_after_init section.
Links
=====
[1]: https://lore.kernel.org/all/20210929151723.162004989@infradead.org/
[2]: https://github.com/vianpl/linux.git -b ct-work-defer-wip
[3]: https://youtu.be/0vjE6fjoVVE
Revisions
=========
RFCv1 -> RFCv2
++++++++++++++
o Rebased onto v6.5-rc1
o Updated the trace filter patches (Steven)
o Fixed __ro_after_init keys used in modules (Peter)
o Dropped the extra context_tracking atomic, squashed the new bits in the
existing .state field (Peter, Frederic)
o Added an RCU_EXPERT config for the RCU dynticks counter size, and added an
rcutorture case for a low-size counter (Paul)
The new TREE11 case with a 2-bit dynticks counter seems to pass when ran
against this series.
o Fixed flush_tlb_kernel_range_deferrable() definition
Peter Zijlstra (1):
jump_label,module: Don't alloc static_key_mod for __ro_after_init keys
Valentin Schneider (19):
tracing/filters: Dynamically allocate filter_pred.regex
tracing/filters: Enable filtering a cpumask field by another cpumask
tracing/filters: Enable filtering a scalar field by a cpumask
tracing/filters: Enable filtering the CPU common field by a cpumask
tracing/filters: Optimise cpumask vs cpumask filtering when user mask
is a single CPU
tracing/filters: Optimise scalar vs cpumask filtering when the user
mask is a single CPU
tracing/filters: Optimise CPU vs cpumask filtering when the user mask
is a single CPU
tracing/filters: Further optimise scalar vs cpumask comparison
tracing/filters: Document cpumask filtering
objtool: Flesh out warning related to pv_ops[] calls
objtool: Warn about non __ro_after_init static key usage in .noinstr
context_tracking: Make context_tracking_key __ro_after_init
x86/kvm: Make kvm_async_pf_enabled __ro_after_init
context-tracking: Introduce work deferral infrastructure
rcu: Make RCU dynticks counter size configurable
rcutorture: Add a test config to torture test low RCU_DYNTICKS width
context_tracking,x86: Defer kernel text patching IPIs
context_tracking,x86: Add infrastructure to defer kernel TLBI
x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL
CPUs
Documentation/trace/events.rst | 14 +
arch/Kconfig | 9 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/context_tracking_work.h | 20 ++
arch/x86/include/asm/text-patching.h | 1 +
arch/x86/include/asm/tlbflush.h | 2 +
arch/x86/kernel/alternative.c | 24 +-
arch/x86/kernel/kprobes/core.c | 4 +-
arch/x86/kernel/kprobes/opt.c | 4 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/module.c | 2 +-
arch/x86/mm/tlb.c | 40 ++-
include/asm-generic/sections.h | 5 +
include/linux/context_tracking.h | 26 ++
include/linux/context_tracking_state.h | 65 +++-
include/linux/context_tracking_work.h | 28 ++
include/linux/jump_label.h | 1 +
include/linux/trace_events.h | 1 +
init/main.c | 1 +
kernel/context_tracking.c | 53 ++-
kernel/jump_label.c | 49 +++
kernel/rcu/Kconfig | 33 ++
kernel/time/Kconfig | 5 +
kernel/trace/trace_events_filter.c | 302 ++++++++++++++++--
mm/vmalloc.c | 19 +-
tools/objtool/check.c | 22 +-
tools/objtool/include/objtool/check.h | 1 +
tools/objtool/include/objtool/special.h | 2 +
tools/objtool/special.c | 3 +
.../selftests/rcutorture/configs/rcu/TREE11 | 19 ++
.../rcutorture/configs/rcu/TREE11.boot | 1 +
31 files changed, 695 insertions(+), 64 deletions(-)
create mode 100644 arch/x86/include/asm/context_tracking_work.h
create mode 100644 include/linux/context_tracking_work.h
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
--
2.31.1
The indentation of parameterized tests messages is currently broken in kunit.
Try to fix that by introducing a test level attribute, that will be increased
during nested parameterized tests execution, and use it to generate correct
indent at the runtime when printing message or writing them to the log.
Also improve kunit by providing test plan for the parameterized tests.
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Michal Wajdeczko (4):
kunit: Drop redundant text from suite init failure message
kunit: Fix indentation level of suite messages
kunit: Fix indentation of parameterized tests messages
kunit: Prepare test plan for parameterized subtests
include/kunit/test.h | 25 ++++++++++++--
lib/kunit/test.c | 79 +++++++++++++++++++++++---------------------
2 files changed, 65 insertions(+), 39 deletions(-)
--
2.25.1