From: Zi Yan <ziy(a)nvidia.com>
Hi all,
File folio supports any order and people would like to support flexible orders
for anonymous folio[1] too. Currently, split_huge_page() only splits a huge
page to order-0 pages, but splitting to orders higher than 0 is also useful.
This patchset adds support for splitting a huge page to any lower order pages
and uses it during file folio truncate operations.
The patchset is on top of mm-everything-2023-03-27-21-20.
Changelog
===
Since v2
---
1. Fixed an issue in __split_page_owner() introduced during my rebase
Since v1
---
1. Changed split_page_memcg() and split_page_owner() parameter to use order
2. Used folio_test_pmd_mappable() in place of the equivalent code
Details
===
* Patch 1 changes split_page_memcg() to use order instead of nr_pages
* Patch 2 changes split_page_owner() to use order instead of nr_pages
* Patch 3 and 4 add new_order parameter split_page_memcg() and
split_page_owner() and prepare for upcoming changes.
* Patch 5 adds split_huge_page_to_list_to_order() to split a huge page
to any lower order. The original split_huge_page_to_list() calls
split_huge_page_to_list_to_order() with new_order = 0.
* Patch 6 uses split_huge_page_to_list_to_order() in large pagecache folio
truncation instead of split the large folio all the way down to order-0.
* Patch 7 adds a test API to debugfs and test cases in
split_huge_page_test selftests.
Comments and/or suggestions are welcome.
[1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/
Zi Yan (7):
mm/memcg: use order instead of nr in split_page_memcg()
mm/page_owner: use order instead of nr in split_page_owner()
mm: memcg: make memcg huge page split support any order split.
mm: page_owner: add support for splitting to any order in split
page_owner.
mm: thp: split huge page to any lower order pages.
mm: truncate: split huge page cache page to a non-zero order if
possible.
mm: huge_memory: enable debugfs to split huge pages to any order.
include/linux/huge_mm.h | 10 +-
include/linux/memcontrol.h | 4 +-
include/linux/page_owner.h | 10 +-
mm/huge_memory.c | 137 ++++++++---
mm/memcontrol.c | 10 +-
mm/page_alloc.c | 8 +-
mm/page_owner.c | 8 +-
mm/truncate.c | 21 +-
.../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++-
9 files changed, 365 insertions(+), 68 deletions(-)
--
2.39.2
Here's a follow-up from my RFC series last year:
https://lore.kernel.org/lkml/20221004093131.40392-1-thuth@redhat.com/T/
and from v1 earlier this year:
https://lore.kernel.org/kvm/20230712075910.22480-1-thuth@redhat.com/
Basic idea of this series is now to use the kselftest_harness.h
framework to get TAP output in the tests, so that it is easier
for the user to see what is going on, and e.g. to be able to
detect whether a certain test is part of the test binary or not
(which is useful when tests get extended in the course of time).
v2:
- Dropped the "Rename the ASSERT_EQ macro" patch (already merged)
- Split the fixes in the sync_regs_test into separate patches
(see the first two patches)
- Introduce the KVM_ONE_VCPU_TEST_SUITE() macro as suggested
by Sean (see third patch) and use it in the following patches
- Add a new patch to convert vmx_pmu_caps_test.c, too
Thomas Huth (7):
KVM: selftests: x86: sync_regs_test: Use vcpu_run() where appropriate
KVM: selftests: x86: sync_regs_test: Get regs structure before
modifying it
KVM: selftests: Add a macro to define a test with one vcpu
KVM: selftests: x86: Use TAP interface in the sync_regs test
KVM: selftests: x86: Use TAP interface in the fix_hypercall test
KVM: selftests: x86: Use TAP interface in the vmx_pmu_caps test
KVM: selftests: x86: Use TAP interface in the userspace_msr_exit test
.../selftests/kvm/include/kvm_test_harness.h | 35 +++++
.../selftests/kvm/x86_64/fix_hypercall_test.c | 27 ++--
.../selftests/kvm/x86_64/sync_regs_test.c | 121 +++++++++++++-----
.../kvm/x86_64/userspace_msr_exit_test.c | 19 +--
.../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 50 ++------
5 files changed, 160 insertions(+), 92 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/kvm_test_harness.h
--
2.41.0
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v10:
- Move pgtable.h defintions into a no __ASSEMBLY__ region to resolve compilation
conflicts (pointed out by Conor)
- Will now compile with allmodconfig
v9:
- Raise the mmap_end default to STACK_TOP_MAX to allow the address space to grow
beyond the default of sv48 on sv57 machines as suggested by Alexandre
- Some of the mmap macros had unnecessary conditionals that I have removed
v8:
- Fix RV32 and the RV32 compat mode of RV64 (suggested by Conor)
- Extract out addr and base from the mmap macros (suggested by Alexandre)
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 33 ++++++++--
arch/riscv/include/asm/processor.h | 52 +++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 261 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.34.1
Hi folks,
This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework for nested translation. Nested
translation is a hardware feature that supports two-stage translation
tables for IOMMU. The second-stage translation table is managed by the
host VMM, while the first-stage translation table is owned by user
space. This allows user space to control the IOMMU mappings for its
devices.
When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.
User space indicates its capability of handling IO page faults by
setting the IOMMU_HWPT_ALLOC_IOPF_CAPABLE flag when allocating a
hardware page table (HWPT). IOMMUFD will then set up its infrastructure
for page fault delivery. On a successful return of HWPT allocation, the
user can retrieve and respond to page faults by reading and writing to
the file descriptor (FD) returned in out_fault_fd.
The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.
This series is based on the latest implementation of nested translation
under discussion [1] and the page fault handling framework refactoring in
the IOMMU core [2].
The series and related patches are available on GitHub: [3]
[1] https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
[2] https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.i…
[3] https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-…
Best regards,
baolu
Change log:
v2:
- Move all iommu refactoring patches into a sparated series and discuss
it in a different thread. The latest patch series [v6] is available at
https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.i…
- We discussed the timeout of the pending page fault messages. We
agreed that we shouldn't apply any timeout policy for the page fault
handling in user space.
https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/
- Jason suggested that we adopt a simple file descriptor interface for
reading and responding to I/O page requests, so that user space
applications can improve performance using io_uring.
https://lore.kernel.org/linux-iommu/ZJWjD1ajeem6pK3I@ziepe.ca/
v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.…
Lu Baolu (6):
iommu: Add iommu page fault cookie helpers
iommufd: Add iommu page fault uapi data
iommufd: Initializing and releasing IO page fault data
iommufd: Deliver fault messages to user space
iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_IOPF test support
iommufd/selftest: Add coverage for IOMMU_TEST_OP_TRIGGER_IOPF
include/linux/iommu.h | 9 +
drivers/iommu/iommu-priv.h | 15 +
drivers/iommu/iommufd/iommufd_private.h | 12 +
drivers/iommu/iommufd/iommufd_test.h | 8 +
include/uapi/linux/iommufd.h | 65 +++++
tools/testing/selftests/iommu/iommufd_utils.h | 66 ++++-
drivers/iommu/io-pgfault.c | 50 ++++
drivers/iommu/iommufd/device.c | 69 ++++-
drivers/iommu/iommufd/hw_pagetable.c | 260 +++++++++++++++++-
drivers/iommu/iommufd/selftest.c | 56 ++++
tools/testing/selftests/iommu/iommufd.c | 24 +-
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
12 files changed, 620 insertions(+), 16 deletions(-)
--
2.34.1
This patchset moves the current kernel testing livepatch modules from
lib/livepatches to tools/testing/selftest/livepatch/test_modules, and compiles
them as out-of-tree modules before testing.
There is also a new test being added. This new test exercises multiple processes
calling a syscall, while a livepatch patched the syscall.
Why this move is an improvement:
* The modules are now compiled as out-of-tree modules against the current
running kernel, making them capable of being tested on different systems with
newer or older kernels.
* Such approach now needs kernel-devel package to be installed, since they are
out-of-tree modules. These can be generated by running "make rpm-pkg" in the
kernel source.
What needs to be solved:
* Currently gen_tar only packages the resulting binaries of the tests, and not
the sources. For the current approach, the newly added modules would be
compiled and then packaged. It works when testing on a system with the same
kernel version. But it will fail when running on a machine with different kernel
version, since module was compiled against the kernel currently running.
This is not a new problem, just aligning the expectations. For the current
approach to be truly system agnostic gen_tar would need to include the module
and program sources to be compiled in the target systems.
I'm sending the patches now so it can be discussed before Plumbers.
Thanks in advance!
Marcos
To: Shuah Khan <shuah(a)kernel.org>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Heiko Carstens <hca(a)linux.ibm.com>
To: Vasily Gorbik <gor(a)linux.ibm.com>
To: Alexander Gordeev <agordeev(a)linux.ibm.com>
To: Christian Borntraeger <borntraeger(a)linux.ibm.com>
To: Sven Schnelle <svens(a)linux.ibm.com>
To: Josh Poimboeuf <jpoimboe(a)kernel.org>
To: Jiri Kosina <jikos(a)kernel.org>
To: Miroslav Benes <mbenes(a)suse.cz>
To: Petr Mladek <pmladek(a)suse.com>
To: Joe Lawrence <joe.lawrence(a)redhat.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-s390(a)vger.kernel.org
Cc: live-patching(a)vger.kernel.org
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
Changes in v3:
* Rebased on top of v6.6-rc5
* The commits messages were improved (Thanks Petr!)
* Created TEST_GEN_MODS_DIR variable to point to a directly that contains kernel
modules, and adapt selftests to build it before running the test.
* Moved test_klp-call_getpid out of test_programs, since the gen_tar
would just copy the generated test programs to the livepatches dir,
and so scripts relying on test_programs/test_klp-call_getpid will fail.
* Added a module_param for klp_pids, describing it's usage.
* Simplified the call_getpid program to ignore the return of getpid syscall,
since we only want to make sure the process transitions correctly to the
patched stated
* The test-syscall.sh not prints a log message showing the number of remaining
processes to transition into to livepatched state, and check_output expects it
to be 0.
* Added MODULE_AUTHOR and MODULE_DESCRIPTION to test_klp_syscall.c
The v2 can be seen here:
https://lore.kernel.org/linux-kselftest/20220630141226.2802-1-mpdesouza@sus…
---
Marcos Paulo de Souza (3):
kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable
livepatch: Move tests from lib/livepatch to selftests/livepatch
selftests: livepatch: Test livepatching a heavily called syscall
Documentation/dev-tools/kselftest.rst | 4 +
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
lib/Kconfig.debug | 22 ----
lib/Makefile | 2 -
lib/livepatch/Makefile | 14 ---
tools/testing/selftests/lib.mk | 20 +++-
tools/testing/selftests/livepatch/Makefile | 5 +-
tools/testing/selftests/livepatch/README | 17 +--
tools/testing/selftests/livepatch/config | 1 -
tools/testing/selftests/livepatch/functions.sh | 34 +++---
.../testing/selftests/livepatch/test-callbacks.sh | 50 ++++-----
tools/testing/selftests/livepatch/test-ftrace.sh | 6 +-
.../testing/selftests/livepatch/test-livepatch.sh | 10 +-
.../selftests/livepatch/test-shadow-vars.sh | 2 +-
tools/testing/selftests/livepatch/test-state.sh | 18 ++--
tools/testing/selftests/livepatch/test-syscall.sh | 53 ++++++++++
tools/testing/selftests/livepatch/test-sysfs.sh | 6 +-
.../selftests/livepatch/test_klp-call_getpid.c | 44 ++++++++
.../selftests/livepatch/test_modules/Makefile | 20 ++++
.../test_modules}/test_klp_atomic_replace.c | 0
.../test_modules}/test_klp_callbacks_busy.c | 0
.../test_modules}/test_klp_callbacks_demo.c | 0
.../test_modules}/test_klp_callbacks_demo2.c | 0
.../test_modules}/test_klp_callbacks_mod.c | 0
.../livepatch/test_modules}/test_klp_livepatch.c | 0
.../livepatch/test_modules}/test_klp_shadow_vars.c | 0
.../livepatch/test_modules}/test_klp_state.c | 0
.../livepatch/test_modules}/test_klp_state2.c | 0
.../livepatch/test_modules}/test_klp_state3.c | 0
.../livepatch/test_modules/test_klp_syscall.c | 116 +++++++++++++++++++++
31 files changed, 325 insertions(+), 121 deletions(-)
---
base-commit: 6489bf2e1df1c84e9bcd4694029ff35b39fd3397
change-id: 20231031-send-lp-kselftests-4c917dcd4565
Best regards,
--
Marcos Paulo de Souza <mpdesouza(a)suse.com>
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Extend the steering program feature by introducing a dedicated program
type: BPF_PROG_TYPE_VNET_HASH. This program type is capable to report
the hash value and the queue to use at the same time.
This is a rewrite of a RFC patch series submitted by Yuri Benditovich that
incorporates feedbacks for the series and V1 of this series:
https://lore.kernel.org/lkml/20210112194143.1494-1-yuri.benditovich@daynix.…
QEMU patched to use this new feature is available at:
https://github.com/daynix/qemu/tree/akihikodaki/bpf
The QEMU patches will soon be submitted to the upstream as RFC too.
V1 -> V2:
Changed to introduce a new BPF program type.
Akihiko Odaki (7):
bpf: Introduce BPF_PROG_TYPE_VNET_HASH
bpf: Add vnet_hash members to __sk_buff
skbuff: Introduce SKB_EXT_TUN_VNET_HASH
virtio_net: Add virtio_net_hdr_v1_hash_from_skb()
tun: Support BPF_PROG_TYPE_VNET_HASH
selftests/bpf: Test BPF_PROG_TYPE_VNET_HASH
vhost_net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/bpf/bpf_prog_run.rst | 1 +
Documentation/bpf/libbpf/program_types.rst | 2 +
drivers/net/tun.c | 158 +++++--
drivers/vhost/net.c | 16 +-
include/linux/bpf_types.h | 2 +
include/linux/filter.h | 7 +
include/linux/skbuff.h | 10 +
include/linux/virtio_net.h | 22 +
include/uapi/linux/bpf.h | 5 +
kernel/bpf/verifier.c | 6 +
net/core/filter.c | 86 +++-
net/core/skbuff.c | 3 +
tools/include/uapi/linux/bpf.h | 5 +
tools/lib/bpf/libbpf.c | 2 +
tools/testing/selftests/bpf/config | 1 +
tools/testing/selftests/bpf/config.aarch64 | 1 -
.../selftests/bpf/prog_tests/vnet_hash.c | 385 ++++++++++++++++++
tools/testing/selftests/bpf/progs/vnet_hash.c | 16 +
18 files changed, 681 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vnet_hash.c
create mode 100644 tools/testing/selftests/bpf/progs/vnet_hash.c
--
2.42.0
Hi,
Changes since v2 [1]:
* Added a new patch (sent separately earlier) at the end, to error out
if "make headers" has not yet been run.
* Reworked and simplified the uffd movement patch. Now it only moves
some uffd*() routines, not all, and doesn't have to touch the Makefile
at all. This lighter touch also allowed me to drop the "move psize(),
pshift() into vm_utils.c" entirely. I expect Peter Xu will be a little
happier with this new approach.
* Fixed the commit description for the MADV_COLLAPSE patch.
* Added more Reviewed-by tags from David Hildenbrand and Peter Xu.
[1] https://lore.kernel.org/all/20230603021558.95299-1-jhubbard@nvidia.com/
John Hubbard (11):
selftests/mm: fix uffd-stress unused function warning
selftests/mm: fix unused variable warnings in hugetlb-madvise.c,
migration.c
selftests/mm: fix "warning: expression which evaluates to zero..." in
mlock2-tests.c
selftests/mm: fix invocation of tests that are run via shell scripts
selftests/mm: .gitignore: add mkdirty, va_high_addr_switch
selftests/mm: fix two -Wformat-security warnings in uffd builds
selftests/mm: fix a "possibly uninitialized" warning in pkey-x86.h
selftests/mm: fix build failures due to missing MADV_COLLAPSE
selftests/mm: move certain uffd*() routines from vm_util.c to
uffd-common.c
Documentation: kselftest: "make headers" is a prerequisite
selftests: error out if kernel header files are not yet built
Documentation/dev-tools/kselftest.rst | 1 +
tools/testing/selftests/lib.mk | 36 +++++++++++-
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/cow.c | 7 ---
tools/testing/selftests/mm/hugetlb-madvise.c | 8 ++-
tools/testing/selftests/mm/khugepaged.c | 10 ----
tools/testing/selftests/mm/migration.c | 5 +-
tools/testing/selftests/mm/mlock2-tests.c | 1 -
tools/testing/selftests/mm/pkey-x86.h | 2 +-
tools/testing/selftests/mm/run_vmtests.sh | 6 +-
tools/testing/selftests/mm/uffd-common.c | 59 ++++++++++++++++++++
tools/testing/selftests/mm/uffd-common.h | 5 ++
tools/testing/selftests/mm/uffd-stress.c | 10 ----
tools/testing/selftests/mm/uffd-unit-tests.c | 16 ++----
tools/testing/selftests/mm/vm_util.c | 59 --------------------
tools/testing/selftests/mm/vm_util.h | 14 +++--
16 files changed, 130 insertions(+), 111 deletions(-)
base-commit: f8dba31b0a826e691949cd4fdfa5c30defaac8c5
--
2.40.1