From: Ira Weiny <ira.weiny(a)intel.com>
This RFC series has been reviewed by Dave Hansen.
Changes from RFC:
Clean up commit messages based on Peter Zijlstra's and Dave Hansen's
feedback
Fix static branch anti-pattern
New patch:
(memremap: Convert devmap static branch to {inc,dec})
This was the code I used as a model for my static branch which
I believe is wrong now.
New Patch:
(x86/entry: Preserve PKRS MSR through exceptions)
This attempts to preserve the per-logical-processor MSR, and
reference counting during exceptions. I'd really like feed
back on this because I _think_ it should work but I'm afraid
I'm missing something as my testing has shown a lot of spotty
crashes which don't make sense to me.
This patch set introduces a new page protection mechanism for supervisor pages,
Protection Key Supervisor (PKS) and an initial user of them, persistent memory,
PMEM.
PKS enables protections on 'domains' of supervisor pages to limit supervisor
mode access to those pages beyond the normal paging protections. They work in
a similar fashion to user space pkeys. Like User page pkeys (PKU), supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
A page mapping is assigned to a domain by setting a pkey in the page table
entry.
Unlike User pkeys no new instructions are added; rather WRMSR/RDMSR are used to
update the PKRS register.
XSAVE is not supported for the PKRS MSR. To reduce software complexity the
implementation saves/restores the MSR across context switches but not during
irqs. This is a compromise which results is a hardening of unwanted access
without absolute restriction.
For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.
Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
arch and/or CPU instance.
Protecting against stray writes is particularly important for PMEM because,
unlike writes to anonymous memory, writes to PMEM persists across a reboot.
Thus data corruption could result in permanent loss of data.
The following attributes of PKS makes it perfect as a mechanism to protect PMEM
from stray access within the kernel:
1) Fast switching of permissions
2) Prevents access without page table manipulations
3) Works on a per thread basis
4) No TLB flushes required
The second half of this series thus uses the PKS mechanism to protect PMEM from
stray access.
PKS is available with 4 and 5 level paging. Like PKRU is takes 4 bits from the
PTE to store the pkey within the entry.
Implementation details
----------------------
Modifications of task struct in patches:
(x86/pks: Preserve the PKRS MSR on context switch)
(memremap: Add zone device access protection)
Because pkey access is per-thread 2 modifications are made to the task struct.
The first is a saved copy of the MSR during context switches. The second
reference counts access to the device domain to correctly handle kmap nesting
properly.
Maintain PKS setting in a re-entrant manner in patch:
(memremap: Add zone device access protection)
(x86/entry: Preserve PKRS MSR through exceptions)
Using local_irq_save() seems to be the safest and fastest way to maintain kmap
as re-entrant. But there may be a better way. spin_lock_irq() and atomic
counters were considered. But atomic counters do not properly protect the pkey
update and spin_lock_irq() would deadlock. Suggestions are welcome.
Also preserving the pks state requires the exception handling code to store the
ref count during exception processing. This seems like a layering violation
but it works.
The use of kmap in patch:
(kmap: Add stray write protection for device pages)
To keep general access to PMEM pages general, we piggy back on the kmap()
interface as there are many places in the kernel who do not have, nor should be
required to have, a priori knowledge that a page is PMEM. The modifications to
the kmap code is careful to quickly determine which pages don't require special
handling to reduce overhead for non PMEM pages.
Breakdown of patches
--------------------
Implement PKS within x86 arch:
x86/pkeys: Create pkeys_internal.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Preserve the PKRS MSR on context switch
x86/pks: Add PKS kernel API
x86/pks: Add a debugfs file for allocated PKS keys
Documentation/pkeys: Update documentation for kernel pkeys
x86/pks: Add PKS Test code
pre-req bug fixes for dax:
fs/dax: Remove unused size parameter
drivers/dax: Expand lock scope to cover the use of addresses
Add stray write protection to PMEM:
memremap: Add zone device access protection
kmap: Add stray write protection for device pages
dax: Stray write protection for dax_direct_access()
nvdimm/pmem: Stray write protection for pmem->virt_addr
[dax|pmem]: Enable stray write protection
Fenghua Yu (4):
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Add PKS kernel API
x86/pks: Add a debugfs file for allocated PKS keys
Ira Weiny (13):
x86/pkeys: Create pkeys_internal.h
x86/pks: Preserve the PKRS MSR on context switch
Documentation/pkeys: Update documentation for kernel pkeys
x86/pks: Add PKS Test code
memremap: Convert devmap static branch to {inc,dec}
fs/dax: Remove unused size parameter
drivers/dax: Expand lock scope to cover the use of addresses
memremap: Add zone device access protection
kmap: Add stray write protection for device pages
dax: Stray write protection for dax_direct_access()
nvdimm/pmem: Stray write protection for pmem->virt_addr
[dax|pmem]: Enable stray write protection
x86/entry: Preserve PKRS MSR across exceptions
Documentation/core-api/protection-keys.rst | 81 +++-
arch/x86/Kconfig | 1 +
arch/x86/entry/common.c | 78 +++-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/idtentry.h | 2 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 13 +-
arch/x86/include/asm/pgtable_types.h | 4 +
arch/x86/include/asm/pkeys.h | 43 ++
arch/x86/include/asm/pkeys_internal.h | 36 ++
arch/x86/include/asm/processor.h | 13 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 17 +
arch/x86/kernel/fpu/xstate.c | 17 +-
arch/x86/kernel/process.c | 34 ++
arch/x86/mm/fault.c | 16 +-
arch/x86/mm/pkeys.c | 174 +++++++-
drivers/dax/device.c | 2 +
drivers/dax/super.c | 5 +-
drivers/nvdimm/pmem.c | 6 +
fs/dax.c | 13 +-
include/linux/highmem.h | 32 +-
include/linux/memremap.h | 1 +
include/linux/mm.h | 33 ++
include/linux/pkeys.h | 18 +
include/linux/sched.h | 3 +
init/init_task.c | 3 +
kernel/fork.c | 3 +
lib/Kconfig.debug | 12 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 452 ++++++++++++++++++++
mm/Kconfig | 15 +
mm/memremap.c | 105 ++++-
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 65 +++
36 files changed, 1243 insertions(+), 67 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_internal.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c
--
2.28.0.rc0.12.gb6a658bd00c9
This adds the conversion of the test_sort.c to KUnit test.
Please apply this commit first (linux-kselftest/kunit-fixes):
3f37d14b8a3152441f36b6bc74000996679f0998 kunit: kunit_config: Fix parsing of CONFIG options with space
Signed-off-by: Vitor Massaru Iha <vitor(a)massaru.org>
---
lib/Kconfig.debug | 26 +++++++++++++++++---------
lib/Makefile | 2 +-
lib/{test_sort.c => sort_kunit.c} | 31 +++++++++++++++----------------
3 files changed, 33 insertions(+), 26 deletions(-)
rename lib/{test_sort.c => sort_kunit.c} (55%)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9ad9210d70a1..1fe19e78d7ca 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1874,15 +1874,6 @@ config TEST_MIN_HEAP
If unsure, say N.
-config TEST_SORT
- tristate "Array-based sort test"
- depends on DEBUG_KERNEL || m
- help
- This option enables the self-test function of 'sort()' at boot,
- or at module load time.
-
- If unsure, say N.
-
config KPROBES_SANITY_TEST
bool "Kprobes sanity tests"
depends on DEBUG_KERNEL
@@ -2185,6 +2176,23 @@ config LINEAR_RANGES_TEST
If unsure, say N.
+config SORT_KUNIT
+ tristate "KUnit test for Array-based sort"
+ depends on DEBUG_KERNEL || m
+ help
+ This option enables the KUnit function of 'sort()' at boot,
+ or at module load time.
+
+ KUnit tests run during boot and output the results to the debug log
+ in TAP format (http://testanything.org/). Only useful for kernel devs
+ running the KUnit test harness, and not intended for inclusion into a
+ production build.
+
+ For more information on KUnit and unit tests in general please refer
+ to the KUnit documentation in Documentation/dev-tools/kunit/.
+
+ If unsure, say N.
+
config TEST_UDELAY
tristate "udelay test driver"
help
diff --git a/lib/Makefile b/lib/Makefile
index b1c42c10073b..c22bb13b0a08 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -77,7 +77,6 @@ obj-$(CONFIG_TEST_LKM) += test_module.o
obj-$(CONFIG_TEST_VMALLOC) += test_vmalloc.o
obj-$(CONFIG_TEST_OVERFLOW) += test_overflow.o
obj-$(CONFIG_TEST_RHASHTABLE) += test_rhashtable.o
-obj-$(CONFIG_TEST_SORT) += test_sort.o
obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o
obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o
obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
@@ -318,3 +317,4 @@ obj-$(CONFIG_OBJAGG) += objagg.o
# KUnit tests
obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o
obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o
+obj-$(CONFIG_SORT_KUNIT) += sort_kunit.o
diff --git a/lib/test_sort.c b/lib/sort_kunit.c
similarity index 55%
rename from lib/test_sort.c
rename to lib/sort_kunit.c
index 52edbe10f2e5..03ba1cf1285c 100644
--- a/lib/test_sort.c
+++ b/lib/sort_kunit.c
@@ -1,7 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/sort.h>
-#include <linux/slab.h>
-#include <linux/module.h>
+#include <kunit/test.h>
/* a simple boot-time regression test */
@@ -12,13 +11,12 @@ static int __init cmpint(const void *a, const void *b)
return *(int *)a - *(int *)b;
}
-static int __init test_sort_init(void)
+static void __init sort_test(struct kunit *test)
{
- int *a, i, r = 1, err = -ENOMEM;
+ int *a, i, r = 1;
a = kmalloc_array(TEST_LEN, sizeof(*a), GFP_KERNEL);
- if (!a)
- return err;
+ KUNIT_ASSERT_FALSE_MSG(test, a == NULL, "kmalloc_array failed");
for (i = 0; i < TEST_LEN; i++) {
r = (r * 725861) % 6599;
@@ -27,24 +25,25 @@ static int __init test_sort_init(void)
sort(a, TEST_LEN, sizeof(*a), cmpint, NULL);
- err = -EINVAL;
for (i = 0; i < TEST_LEN-1; i++)
if (a[i] > a[i+1]) {
- pr_err("test has failed\n");
+ KUNIT_FAIL(test, "test has failed");
goto exit;
}
- err = 0;
- pr_info("test passed\n");
exit:
kfree(a);
- return err;
}
-static void __exit test_sort_exit(void)
-{
-}
+static struct kunit_case sort_test_cases[] = {
+ KUNIT_CASE(sort_test),
+ {}
+};
+
+static struct kunit_suite sort_test_suite = {
+ .name = "sort",
+ .test_cases = sort_test_cases,
+};
-module_init(test_sort_init);
-module_exit(test_sort_exit);
+kunit_test_suites(&sort_test_suite);
MODULE_LICENSE("GPL");
base-commit: d43c7fb05765152d4d4a39a8ef957c4ea14d8847
--
2.26.2
Add a cleanup() path upon exit, making it possible to run the test twice in a
row:
$ sudo bash -x ./txtimestamp.sh
+ set -e
++ ip netns identify
+ [[ '' == \r\o\o\t ]]
+ main
+ [[ 0 -eq 0 ]]
+ run_test_all
+ setup
+ tc qdisc add dev lo root netem delay 1ms
Error: Exclusivity flag on, cannot modify.
Signed-off-by: Paolo Pisati <paolo.pisati(a)canonical.com>
---
tools/testing/selftests/net/txtimestamp.sh | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/net/txtimestamp.sh b/tools/testing/selftests/net/txtimestamp.sh
index eea6f5193693..77f29cabff87 100755
--- a/tools/testing/selftests/net/txtimestamp.sh
+++ b/tools/testing/selftests/net/txtimestamp.sh
@@ -23,6 +23,14 @@ setup() {
action mirred egress redirect dev ifb_netem0
}
+cleanup() {
+ tc filter del dev lo parent ffff:
+ tc qdisc del dev lo handle ffff: ingress
+ tc qdisc del dev ifb_netem0 root
+ ip link del ifb_netem0
+ tc qdisc del dev lo root
+}
+
run_test_v4v6() {
# SND will be delayed 1000us
# ACK will be delayed 6000us: 1 + 2 ms round-trip
@@ -75,6 +83,8 @@ main() {
fi
}
+trap cleanup EXIT
+
if [[ "$(ip netns identify)" == "root" ]]; then
./in_netns.sh $0 $@
else
--
2.27.0
The goal for this series is to avoid device private memory TLB
invalidations when migrating a range of addresses from system
memory to device private memory and some of those pages have already
been migrated. The approach taken is to introduce a new mmu notifier
invalidation event type and use that in the device driver to skip
invalidation callbacks from migrate_vma_setup(). The device driver is
also then expected to handle device MMU invalidations as part of the
migrate_vma_setup(), migrate_vma_pages(), migrate_vma_finalize() process.
Note that this is opt-in. A device driver can simply invalidate its MMU
in the mmu notifier callback and not handle MMU invalidations in the
migration sequence.
This series is based on Jason Gunthorpe's HMM tree (linux-5.8.0-rc4).
Also, this replaces the need for the following two patches I sent:
("mm: fix migrate_vma_setup() src_owner and normal pages")
https://lore.kernel.org/linux-mm/20200622222008.9971-1-rcampbell@nvidia.com
("nouveau: fix mixed normal and device private page migration")
https://lore.kernel.org/lkml/20200622233854.10889-3-rcampbell@nvidia.com
Bharata Rao, let me know if I can add your reviewed-by back since
I made a fair number of changes to this version of the series.
Changes in v3:
Changed the direction field "dir" to a "flags" field and renamed
src_owner to pgmap_owner.
Fixed a locking issue in nouveau for the migration invalidation.
Added a HMM selftest test case to exercise the HMM test driver
invalidation changes.
Removed reviewed-by Bharata B Rao since this version is moderately
changed.
Changes in v2:
Rebase to Jason Gunthorpe's HMM tree.
Added reviewed-by from Bharata B Rao.
Rename the mmu_notifier_range::data field to migrate_pgmap_owner as
suggested by Jason Gunthorpe.
Ralph Campbell (5):
nouveau: fix storing invalid ptes
mm/migrate: add a flags parameter to migrate_vma
mm/notifier: add migration invalidation type
nouveau/svm: use the new migration invalidation
mm/hmm/test: use the new migration invalidation
arch/powerpc/kvm/book3s_hv_uvmem.c | 4 ++-
drivers/gpu/drm/nouveau/nouveau_dmem.c | 19 ++++++++---
drivers/gpu/drm/nouveau/nouveau_svm.c | 21 +++++-------
drivers/gpu/drm/nouveau/nouveau_svm.h | 13 ++++++-
.../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 13 ++++---
include/linux/migrate.h | 16 ++++++---
include/linux/mmu_notifier.h | 7 ++++
lib/test_hmm.c | 34 +++++++++++--------
mm/migrate.c | 14 ++++++--
tools/testing/selftests/vm/hmm-tests.c | 18 +++++++---
10 files changed, 112 insertions(+), 47 deletions(-)
--
2.20.1
KUnit test cases run on kthreads, and kthreads don't have an
adddress space (current->mm is NULL), but processes have mm.
The purpose of this patch is to allow to borrow mm to KUnit kthread
after userspace is brought up, because we know that there are processes
running, at least the process that loaded the module to borrow mm.
This allows, for example, tests such as user_copy_kunit, which uses
vm_mmap, which needs current->mm.
Signed-off-by: Vitor Massaru Iha <vitor(a)massaru.org>
---
v2:
* splitted patch in 3:
- Allows to install and load modules in root filesystem;
- Provides an userspace memory context when tests are compiled
as module;
- Convert test_user_copy to KUnit test;
* added documentation;
* added more explanation;
* added a missed test pointer;
* released mm with mmput();
v3:
* rebased with last kunit branch
* Please apply this commit from kunit-fixes:
3f37d14b8a3152441f36b6bc74000996679f0998
Documentation/dev-tools/kunit/usage.rst | 14 ++++++++++++++
include/kunit/test.h | 12 ++++++++++++
lib/kunit/try-catch.c | 15 ++++++++++++++-
3 files changed, 40 insertions(+), 1 deletion(-)
---
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index 3c3fe8b5fecc..9f909157be34 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -448,6 +448,20 @@ We can now use it to test ``struct eeprom_buffer``:
.. _kunit-on-non-uml:
+User-space context
+------------------
+
+I case you need a user-space context, for now this is only possible through
+tests compiled as a module. And it will be necessary to use a root filesystem
+and uml_utilities.
+
+Example:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py run --timeout=60 --uml_rootfs_dir=.uml_rootfs
+
+
KUnit on non-UML architectures
==============================
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 59f3144f009a..ae3337139c65 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -222,6 +222,18 @@ struct kunit {
* protect it with some type of lock.
*/
struct list_head resources; /* Protected by lock. */
+ /*
+ * KUnit test cases run on kthreads, and kthreads don't have an
+ * adddress space (current->mm is NULL), but processes have mm.
+ *
+ * The purpose of this mm_struct is to allow to borrow mm to KUnit kthread
+ * after userspace is brought up, because we know that there are processes
+ * running, at least the process that loaded the module to borrow mm.
+ *
+ * This allows, for example, tests such as user_copy_kunit, which uses
+ * vm_mmap, which needs current->mm.
+ */
+ struct mm_struct *mm;
};
void kunit_init_test(struct kunit *test, const char *name, char *log);
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
index 0dd434e40487..d03e2093985b 100644
--- a/lib/kunit/try-catch.c
+++ b/lib/kunit/try-catch.c
@@ -11,7 +11,8 @@
#include <linux/completion.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
-
+#include <linux/sched/mm.h>
+#include <linux/sched/task.h>
#include "try-catch-impl.h"
void __noreturn kunit_try_catch_throw(struct kunit_try_catch *try_catch)
@@ -24,8 +25,17 @@ EXPORT_SYMBOL_GPL(kunit_try_catch_throw);
static int kunit_generic_run_threadfn_adapter(void *data)
{
struct kunit_try_catch *try_catch = data;
+ struct kunit *test = try_catch->test;
+
+ if (test != NULL && test->mm != NULL)
+ kthread_use_mm(test->mm);
try_catch->try(try_catch->context);
+ if (test != NULL && test->mm != NULL) {
+ kthread_unuse_mm(test->mm);
+ mmput(test->mm);
+ test->mm = NULL;
+ }
complete_and_exit(try_catch->try_completion, 0);
}
@@ -65,6 +75,9 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
try_catch->context = context;
try_catch->try_completion = &try_completion;
try_catch->try_result = 0;
+
+ test->mm = get_task_mm(current);
+
task_struct = kthread_run(kunit_generic_run_threadfn_adapter,
try_catch,
"kunit_try_catch_thread");
base-commit: d43c7fb05765152d4d4a39a8ef957c4ea14d8847
--
2.26.2
This patch series extends the previously add __ksym externs with btf
info.
Right now the __ksym externs are treated as pure 64-bit scalar value.
Libbpf replaces ld_imm64 insn of __ksym by its kernel address at load
time. This patch series extend those extern with their btf info. Note
that btf support for __ksym must come with the btf that has VARs encoded
to work properly. Therefore, these patches are tested against a btf
generated by a patched pahole, whose change will available in released
pahole soon.
There are a couple of design choices that I would like feedbacks from
bpf/btf experts.
1. Because the newly added pseudo_btf_id needs to carry both a kernel
address (64 bits) and a btf id (32 bits), I used the 'off' fields
of ld_imm insn to carry btf id. I wonder if this breaks anything or
if there is a better idea.
2. Since only a subset of vars are going to be encoded into the new
btf, if a ksym that doesn't find its btf id, it doesn't get
converted into pseudo_btf_id. It is still treated as pure scalar
value. But we require kernel btf to be loaded in libbpf if there is
any ksym in the bpf prog.
This is RFC as it requires pahole changes that encode kernel vars into
btf.
Hao Luo (2):
bpf: BTF support for __ksym externs
selftests/bpf: Test __ksym externs with BTF
include/uapi/linux/bpf.h | 37 ++++++++++----
kernel/bpf/verifier.c | 26 ++++++++--
tools/include/uapi/linux/bpf.h | 37 ++++++++++----
tools/lib/bpf/libbpf.c | 50 ++++++++++++++++++-
.../testing/selftests/bpf/prog_tests/ksyms.c | 2 +
.../testing/selftests/bpf/progs/test_ksyms.c | 14 ++++++
6 files changed, 143 insertions(+), 23 deletions(-)
--
2.27.0.389.gc38d7665816-goog