From: Zi Yan <ziy(a)nvidia.com>
Hi all,
File folio supports any order and people would like to support flexible orders
for anonymous folio[1] too. Currently, split_huge_page() only splits a huge
page to order-0 pages, but splitting to orders higher than 0 is also useful.
This patchset adds support for splitting a huge page to any lower order pages
and uses it during folio truncate operations.
The patchset is on top of mm-everything-2023-03-19-21-50.
* Patch 1 and 2 add new_order parameter split_page_memcg() and
split_page_owner() and prepare for upcoming changes.
* Patch 3 adds split_huge_page_to_list_to_order() to split a huge page
to any lower order. The original split_huge_page_to_list() calls
split_huge_page_to_list_to_order() with new_order = 0.
* Patch 4 uses split_huge_page_to_list_to_order() in large pagecache folio
truncation instead of split the large folio all the way down to order-0.
* Patch 5 adds a test API to debugfs and test cases in
split_huge_page_test selftests.
Comments and/or suggestions are welcome.
[1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/
Zi Yan (5):
mm: memcg: make memcg huge page split support any order split.
mm: page_owner: add support for splitting to any order in split
page_owner.
mm: thp: split huge page to any lower order pages.
mm: truncate: split huge page cache page to a non-zero order if
possible.
mm: huge_memory: enable debugfs to split huge pages to any order.
include/linux/huge_mm.h | 10 +-
include/linux/memcontrol.h | 5 +-
include/linux/page_owner.h | 12 +-
mm/huge_memory.c | 138 ++++++++---
mm/memcontrol.c | 8 +-
mm/page_alloc.c | 8 +-
mm/page_owner.c | 11 +-
mm/truncate.c | 21 +-
.../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++-
9 files changed, 368 insertions(+), 70 deletions(-)
--
2.39.2
Add a new selftest for CMMA migration. Also fix a small issue found during
development of the test.
Nico Boehr (2):
KVM: s390: selftests: add selftest for CMMA migration
KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes
arch/s390/kvm/kvm-s390.c | 4 +
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/s390x/cmma_test.c | 679 ++++++++++++++++++
3 files changed, 684 insertions(+)
create mode 100644 tools/testing/selftests/kvm/s390x/cmma_test.c
--
2.39.1
When reporting errors or skips we currently include the diagnostic message
indicating why we're failing or skipping. This isn't ideal since KTAP
defines the entire print as the test name, so if there's an error then test
systems won't detect the test as being the same one as a passing test. Move
the diagnostic to a separate ksft_print_msg() to avoid this issue, the test
name part will always be the same for passes, fails and skips and the
diagnostic information is still displayed.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/alsa/pcm-test.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/alsa/pcm-test.c b/tools/testing/selftests/alsa/pcm-test.c
index 58b525a4a32c..bab56ea67e89 100644
--- a/tools/testing/selftests/alsa/pcm-test.c
+++ b/tools/testing/selftests/alsa/pcm-test.c
@@ -489,17 +489,18 @@ static void test_pcm_time(struct pcm_data *data, enum test_class class,
}
if (!skip)
- ksft_test_result(pass, "%s.%s.%d.%d.%d.%s%s%s\n",
+ ksft_test_result(pass, "%s.%s.%d.%d.%d.%s\n",
test_class_name, test_name,
data->card, data->device, data->subdevice,
- snd_pcm_stream_name(data->stream),
- msg[0] ? " " : "", msg);
+ snd_pcm_stream_name(data->stream));
else
- ksft_test_result_skip("%s.%s.%d.%d.%d.%s%s%s\n",
+ ksft_test_result_skip("%s.%s.%d.%d.%d.%s\n",
test_class_name, test_name,
data->card, data->device, data->subdevice,
- snd_pcm_stream_name(data->stream),
- msg[0] ? " " : "", msg);
+ snd_pcm_stream_name(data->stream));
+
+ if (msg[0])
+ ksft_print_msg("%s\n", msg);
pthread_mutex_unlock(&results_lock);
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230323-alsa-pcm-test-names-bcd31b586ca9
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This is useful when using nolibc for security-critical tools.
Using nolibc has the advantage that the code is easily auditable and
sandboxable with seccomp as no unexpected syscalls are used.
Using compiler-assistent stack protection provides another security
mechanism.
For this to work the compiler and libc have to collaborate.
This patch adds the following parts to nolibc that are required by the
compiler:
* __stack_chk_guard: random sentinel value
* __stack_chk_fail: handler for detected stack smashes
In addition an initialization function is added that randomizes the
sentinel value.
Only support for global guards is implemented.
Register guards are useful in multi-threaded context which nolibc does
not provide support for.
Link: https://lwn.net/Articles/584225/
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Code and comments style fixes
- Only use raw syscalls in stackprotector functions
- Remove need for dedicated entrypoint and exec() during tests
- Add more rationale
- Shuffle some code around between commits
- Provide compatibility with the -fno-stack-protector patch
- Remove RFC status
- Link to v1: https://lore.kernel.org/r/20230223-nolibc-stackprotector-v1-0-3e74d81b3f21@…
This series is based on the current rcu/dev branch of Pauls rcu tree.
---
Thomas Weißschuh (8):
tools/nolibc: add definitions for standard fds
tools/nolibc: add helpers for wait() signal exits
tools/nolibc: tests: constify test_names
tools/nolibc: add support for stack protector
tools/nolibc: tests: fold in no-stack-protector cflags
tools/nolibc: tests: add test for -fstack-protector
tools/nolibc: i386: add stackprotector support
tools/nolibc: x86_64: add stackprotector support
tools/include/nolibc/Makefile | 4 +-
tools/include/nolibc/arch-i386.h | 7 ++-
tools/include/nolibc/arch-x86_64.h | 5 +++
tools/include/nolibc/nolibc.h | 1 +
tools/include/nolibc/stackprotector.h | 53 +++++++++++++++++++++++
tools/include/nolibc/types.h | 2 +
tools/include/nolibc/unistd.h | 5 +++
tools/testing/selftests/nolibc/Makefile | 11 ++++-
tools/testing/selftests/nolibc/nolibc-test.c | 64 ++++++++++++++++++++++++++--
9 files changed, 144 insertions(+), 8 deletions(-)
---
base-commit: a9b8406e51603238941dbc6fa1437f8915254ebb
change-id: 20230223-nolibc-stackprotector-d4d5f48ff771
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi,
This series adds initial KVM selftests support for powerpc
(64-bit, BookS). It spans 3 maintainers but it does not really
affect arch/powerpc, and it is well contained in selftests
code, just touches some makefiles and a tiny bit headers so
conflicts should be unlikely and trivial.
Hey Paolo and KVM group, if you didn't take the v1 series yet, could
you please take this instead. Otherwise I can send an incremental
fixup.
Since v1:
- r2 (TOC) was not being set for guest code
- MSR[VSX] was not being set for guest code
- Proper guest interrupt handling instead of quick hack that
just made a ucall out to host.
- Adjust subject to better match kvm selftests convention.
Thanks,
Nick
Nicholas Piggin (2):
KVM: PPC: selftests: implement support for powerpc
KVM: PPC: selftests: basic sanity tests
tools/testing/selftests/kvm/Makefile | 15 +
.../selftests/kvm/include/kvm_util_base.h | 13 +
.../selftests/kvm/include/powerpc/hcall.h | 22 +
.../selftests/kvm/include/powerpc/ppc_asm.h | 17 +
.../selftests/kvm/include/powerpc/processor.h | 32 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 10 +
.../selftests/kvm/lib/powerpc/handlers.S | 96 ++++
.../testing/selftests/kvm/lib/powerpc/hcall.c | 45 ++
.../selftests/kvm/lib/powerpc/processor.c | 411 ++++++++++++++++++
.../testing/selftests/kvm/lib/powerpc/ucall.c | 30 ++
tools/testing/selftests/kvm/powerpc/helpers.h | 46 ++
.../testing/selftests/kvm/powerpc/null_test.c | 166 +++++++
.../selftests/kvm/powerpc/rtas_hcall.c | 146 +++++++
13 files changed, 1049 insertions(+)
create mode 100644 tools/testing/selftests/kvm/include/powerpc/hcall.h
create mode 100644 tools/testing/selftests/kvm/include/powerpc/ppc_asm.h
create mode 100644 tools/testing/selftests/kvm/include/powerpc/processor.h
create mode 100644 tools/testing/selftests/kvm/lib/powerpc/handlers.S
create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.c
create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c
create mode 100644 tools/testing/selftests/kvm/powerpc/helpers.h
create mode 100644 tools/testing/selftests/kvm/powerpc/null_test.c
create mode 100644 tools/testing/selftests/kvm/powerpc/rtas_hcall.c
--
2.37.2
These patches are based on next-20230307 and UFFD_FEATURE_WP_UNPOPULATED
patches from Peter.
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED (https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com)
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + ! GET operation as it can be performed with UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (7):
userfaultfd: Add UFFD WP Async support
userfaultfd: Define dummy uffd_wp_range()
userfaultfd: update documentation to describe UFFD_FEATURE_WP_ASYNC
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Documentation/admin-guide/mm/pagemap.rst | 56 ++
Documentation/admin-guide/mm/userfaultfd.rst | 21 +
fs/proc/task_mmu.c | 366 ++++++++
fs/userfaultfd.c | 25 +-
include/linux/userfaultfd_k.h | 14 +
include/uapi/linux/fs.h | 53 ++
include/uapi/linux/userfaultfd.h | 11 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 ++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 4 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 920 +++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
14 files changed, 1549 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Although there is a provision for 52 bit VA on arm64 platform, it remains
unutilised and higher addresses are not allocated. In order to accommodate
4PB [2^52] virtual address space where supported, NR_CHUNKS_HIGH is changed
accordingly.
Array holding addresses is changed from static allocation to dynamic
allocation to accommodate its voluminous nature which otherwise might
overflow the stack.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-mm(a)kvack.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash(a)arm.com>
---
tools/testing/selftests/mm/virtual_address_range.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 50564512c5ee..bae0ceaf95b1 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -36,13 +36,15 @@
* till it reaches 512TB. One with size 128TB and the
* other being 384TB.
*
- * On Arm64 the address space is 256TB and no high mappings
- * are supported so far.
+ * On Arm64 the address space is 256TB and support for
+ * high mappings up to 4PB virtual address space has
+ * been added.
*/
#define NR_CHUNKS_128TB ((128 * SZ_1TB) / MAP_CHUNK_SIZE) /* Number of chunks for 128TB */
#define NR_CHUNKS_256TB (NR_CHUNKS_128TB * 2UL)
#define NR_CHUNKS_384TB (NR_CHUNKS_128TB * 3UL)
+#define NR_CHUNKS_3840TB (NR_CHUNKS_128TB * 30UL)
#define ADDR_MARK_128TB (1UL << 47) /* First address beyond 128TB */
#define ADDR_MARK_256TB (1UL << 48) /* First address beyond 256TB */
@@ -51,7 +53,7 @@
#define HIGH_ADDR_MARK ADDR_MARK_256TB
#define HIGH_ADDR_SHIFT 49
#define NR_CHUNKS_LOW NR_CHUNKS_256TB
-#define NR_CHUNKS_HIGH 0
+#define NR_CHUNKS_HIGH NR_CHUNKS_3840TB
#else
#define HIGH_ADDR_MARK ADDR_MARK_128TB
#define HIGH_ADDR_SHIFT 48
@@ -101,7 +103,7 @@ static int validate_lower_address_hint(void)
int main(int argc, char *argv[])
{
char *ptr[NR_CHUNKS_LOW];
- char *hptr[NR_CHUNKS_HIGH];
+ char **hptr;
char *hint;
unsigned long i, lchunks, hchunks;
@@ -119,6 +121,9 @@ int main(int argc, char *argv[])
return 1;
}
lchunks = i;
+ hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
+ if (hptr == NULL)
+ return 1;
for (i = 0; i < NR_CHUNKS_HIGH; i++) {
hint = hind_addr();
@@ -139,5 +144,6 @@ int main(int argc, char *argv[])
for (i = 0; i < hchunks; i++)
munmap(hptr[i], MAP_CHUNK_SIZE);
+ free(hptr);
return 0;
}
--
2.30.2