This series is built on top of the Fuad's v7 "mapping guest_memfd backed memory at the host" [1].
With James's KVM userfault [2], it is possible to handle stage-2 faults in guest_memfd in userspace. However, KVM itself also triggers faults in guest_memfd in some cases, for example: PV interfaces like kvmclock, PV EOI and page table walking code when fetching the MMIO instruction on x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] that KVM would be accessing those pages via userspace page tables. In order for such faults to be handled in userspace, guest_memfd needs to support userfaultfd.
Changes since v2 [4]: - James: Fix sgp type when calling shmem_get_folio_gfp - James: Improved vm_ops->fault() error handling - James: Add and make use of the can_userfault() VMA operation - James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag - James: Fix typos and add more checks in the test
Nikita
[1] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/T... [3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAos... [4] https://lore.kernel.org/kvm/20250402160721.97596-1-kalyazin@amazon.com/T/
Nikita Kalyazin (6): mm: userfaultfd: generic continue for non hugetlbfs mm: provide can_userfault vma operation mm: userfaultfd: use can_userfault vma operation KVM: guest_memfd: add support for userfaultfd minor mm: userfaultfd: add UFFD_FEATURE_MINOR_GUEST_MEMFD KVM: selftests: test userfaultfd minor for guest_memfd
fs/userfaultfd.c | 3 +- include/linux/mm.h | 5 + include/linux/mm_types.h | 4 + include/linux/userfaultfd_k.h | 10 +- include/uapi/linux/userfaultfd.h | 8 +- mm/hugetlb.c | 9 +- mm/shmem.c | 17 +++- mm/userfaultfd.c | 47 ++++++--- .../testing/selftests/kvm/guest_memfd_test.c | 99 +++++++++++++++++++ virt/kvm/guest_memfd.c | 10 ++ 10 files changed, 188 insertions(+), 24 deletions(-)
base-commit: 3cc51efc17a2c41a480eed36b31c1773936717e0
Remove shmem-specific code from UFFDIO_CONTINUE implementation for non-huge pages by calling vm_ops->fault(). A new VMF flag, FAULT_FLAG_USERFAULT_CONTINUE, is introduced to avoid recursive call to handle_userfault().
Suggested-by: James Houghton jthoughton@google.com Signed-off-by: Nikita Kalyazin kalyazin@amazon.com --- include/linux/mm_types.h | 4 ++++ mm/hugetlb.c | 2 +- mm/shmem.c | 9 ++++++--- mm/userfaultfd.c | 37 +++++++++++++++++++++++++++---------- 4 files changed, 38 insertions(+), 14 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0234f14f2aa6..2f26ee9742bf 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1429,6 +1429,9 @@ enum tlb_flush_reason { * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached. * We should only access orig_pte if this flag set. * @FAULT_FLAG_VMA_LOCK: The fault is handled under VMA lock. + * @FAULT_FLAG_USERFAULT_CONTINUE: The fault handler must not call userfaultfd + * minor handler as it is being called by the + * userfaultfd code itself. * * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify * whether we would allow page faults to retry by specifying these two @@ -1467,6 +1470,7 @@ enum fault_flag { FAULT_FLAG_UNSHARE = 1 << 10, FAULT_FLAG_ORIG_PTE_VALID = 1 << 11, FAULT_FLAG_VMA_LOCK = 1 << 12, + FAULT_FLAG_USERFAULT_CONTINUE = 1 << 13, };
typedef unsigned int __bitwise zap_flags_t; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 97930d44d460..c004cfdcd4e2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6228,7 +6228,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, }
/* Check for page in userfault range. */ - if (userfaultfd_minor(vma)) { + if (userfaultfd_minor(vma) && !(vmf->flags & FAULT_FLAG_USERFAULT_CONTINUE)) { folio_unlock(folio); folio_put(folio); /* See comment in userfaultfd_missing() block above */ diff --git a/mm/shmem.c b/mm/shmem.c index 1ede0800e846..b4159303fe59 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2467,7 +2467,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, fault_mm = vma ? vma->vm_mm : NULL;
folio = filemap_get_entry(inode->i_mapping, index); - if (folio && vma && userfaultfd_minor(vma)) { + if (folio && vma && userfaultfd_minor(vma) && + !(vmf->flags & FAULT_FLAG_USERFAULT_CONTINUE)) { if (!xa_is_value(folio)) folio_put(folio); *fault_type = handle_userfault(vmf, VM_UFFD_MINOR); @@ -2727,6 +2728,8 @@ static vm_fault_t shmem_falloc_wait(struct vm_fault *vmf, struct inode *inode) static vm_fault_t shmem_fault(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file); + enum sgp_type sgp = vmf->flags & FAULT_FLAG_USERFAULT_CONTINUE ? + SGP_NOALLOC : SGP_CACHE; gfp_t gfp = mapping_gfp_mask(inode->i_mapping); struct folio *folio = NULL; vm_fault_t ret = 0; @@ -2743,8 +2746,8 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) }
WARN_ON_ONCE(vmf->page != NULL); - err = shmem_get_folio_gfp(inode, vmf->pgoff, 0, &folio, SGP_CACHE, - gfp, vmf, &ret); + err = shmem_get_folio_gfp(inode, vmf->pgoff, 0, &folio, sgp, gfp, vmf, + &ret); if (err) return vmf_error(err); if (folio) { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index d06453fa8aba..4b3dbc7dac64 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -380,30 +380,47 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd, return ret; }
-/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ +/* Handles UFFDIO_CONTINUE for all VMAs */ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, uffd_flags_t flags) { - struct inode *inode = file_inode(dst_vma->vm_file); - pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); struct folio *folio; struct page *page; int ret; + struct vm_fault vmf = { + .vma = dst_vma, + .address = dst_addr, + .flags = FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE | + FAULT_FLAG_USERFAULT_CONTINUE, + .pte = NULL, + .page = NULL, + .pgoff = linear_page_index(dst_vma, dst_addr), + }; + + if (!dst_vma->vm_ops || !dst_vma->vm_ops->fault) + return -EINVAL;
- ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); - /* Our caller expects us to return -EFAULT if we failed to find folio */ - if (ret == -ENOENT) +retry: + ret = dst_vma->vm_ops->fault(&vmf); + if (ret & VM_FAULT_ERROR) { ret = -EFAULT; - if (ret) goto out; - if (!folio) { - ret = -EFAULT; + } + + if (ret & VM_FAULT_NOPAGE) { + ret = -EAGAIN; goto out; }
- page = folio_file_page(folio, pgoff); + if (ret & VM_FAULT_RETRY) + goto retry; + + page = vmf.page; + folio = page_folio(page); + BUG_ON(!folio); + if (PageHWPoison(page)) { ret = -EIO; goto out_release;
The new operation allows to decouple the userfaulfd code from dependencies to VMA types, specifically, shmem and hugetlb. The vm_flags bitmap argument is processed with "any" logic, meaning if the VMA type supports any of the flags set, it returns true. This is to avoid multiple calls when checking for __VM_UFFD_FLAGS.
Signed-off-by: Nikita Kalyazin kalyazin@amazon.com --- include/linux/mm.h | 5 +++++ mm/hugetlb.c | 7 +++++++ mm/shmem.c | 8 ++++++++ 3 files changed, 20 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 8483e09aeb2c..488d721d8542 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -680,6 +680,11 @@ struct vm_operations_struct { */ struct page *(*find_special_page)(struct vm_area_struct *vma, unsigned long addr); + /* + * True if the VMA supports userfault at least for one of the vm_flags + */ + bool (*can_userfault)(struct vm_area_struct *vma, + unsigned long vm_flags); };
#ifdef CONFIG_NUMA_BALANCING diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c004cfdcd4e2..f3901c11e1fd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5143,6 +5143,12 @@ static unsigned long hugetlb_vm_op_pagesize(struct vm_area_struct *vma) return huge_page_size(hstate_vma(vma)); }
+static bool hugetlb_vm_op_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + return true; +} + /* * We cannot handle pagefaults against hugetlb pages at all. They cause * handle_mm_fault() to try to instantiate regular-sized pages in the @@ -5168,6 +5174,7 @@ const struct vm_operations_struct hugetlb_vm_ops = { .close = hugetlb_vm_op_close, .may_split = hugetlb_vm_op_split, .pagesize = hugetlb_vm_op_pagesize, + .can_userfault = hugetlb_vm_op_can_userfault, };
static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, diff --git a/mm/shmem.c b/mm/shmem.c index b4159303fe59..0b9e19abd1e9 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2891,6 +2891,12 @@ static struct mempolicy *shmem_get_policy(struct vm_area_struct *vma, return mpol_shared_policy_lookup(&SHMEM_I(inode)->policy, index); }
+static bool shmem_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + return true; +} + static struct mempolicy *shmem_get_pgoff_policy(struct shmem_inode_info *info, pgoff_t index, unsigned int order, pgoff_t *ilx) { @@ -5309,6 +5315,7 @@ static const struct vm_operations_struct shmem_vm_ops = { .set_policy = shmem_set_policy, .get_policy = shmem_get_policy, #endif + .can_userfault = shmem_can_userfault, };
static const struct vm_operations_struct shmem_anon_vm_ops = { @@ -5318,6 +5325,7 @@ static const struct vm_operations_struct shmem_anon_vm_ops = { .set_policy = shmem_set_policy, .get_policy = shmem_get_policy, #endif + .can_userfault = shmem_can_userfault, };
int shmem_init_fs_context(struct fs_context *fc)
Signed-off-by: Nikita Kalyazin kalyazin@amazon.com --- include/linux/userfaultfd_k.h | 13 ++++++------- mm/userfaultfd.c | 10 +++++++--- 2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 75342022d144..64551e8a55fb 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -221,8 +221,8 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma, if (vm_flags & VM_DROPPABLE) return false;
- if ((vm_flags & VM_UFFD_MINOR) && - (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))) + if (!vma->vm_ops->can_userfault || + !vma->vm_ops->can_userfault(vma, VM_UFFD_MINOR)) return false;
/* @@ -235,16 +235,15 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma, #ifndef CONFIG_PTE_MARKER_UFFD_WP /* * If user requested uffd-wp but not enabled pte markers for - * uffd-wp, then shmem & hugetlbfs are not supported but only - * anonymous. + * uffd-wp, then only anonymous is supported. */ if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) return false; #endif
- /* By default, allow any of anon|shmem|hugetlb */ - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); + return vma_is_anonymous(vma) || + (vma->vm_ops->can_userfault && + vma->vm_ops->can_userfault(vma, vm_flags)); }
static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 4b3dbc7dac64..0aa82c968e16 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -728,6 +728,7 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, unsigned long src_addr, dst_addr; long copied; struct folio *folio; + bool can_userfault;
/* * Sanitize the command parameters: @@ -787,10 +788,13 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, src_start, len, flags);
- if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) + can_userfault = dst_vma->vm_ops->can_userfault && + dst_vma->vm_ops->can_userfault(dst_vma, __VM_UFFD_FLAGS); + + if (!vma_is_anonymous(dst_vma) && !can_userfault) goto out_unlock; - if (!vma_is_shmem(dst_vma) && - uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) + + if (!can_userfault && uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) goto out_unlock;
while (src_addr < src_start + len) {
Add support for sending a pagefault event if userfaultfd is registered. Only page minor event is currently supported.
Signed-off-by: Nikita Kalyazin kalyazin@amazon.com --- virt/kvm/guest_memfd.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index fbf89e643add..096d89e7282d 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,6 +4,9 @@ #include <linux/kvm_host.h> #include <linux/pagemap.h> #include <linux/anon_inodes.h> +#ifdef CONFIG_KVM_PRIVATE_MEM +#include <linux/userfaultfd_k.h> +#endif /* CONFIG_KVM_PRIVATE_MEM */
#include "kvm_mm.h"
@@ -380,6 +383,13 @@ static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) kvm_gmem_mark_prepared(folio); }
+ if (userfaultfd_minor(vmf->vma) && + !(vmf->flags & FAULT_FLAG_USERFAULT_CONTINUE)) { + folio_unlock(folio); + filemap_invalidate_unlock_shared(inode->i_mapping); + return handle_userfault(vmf, VM_UFFD_MINOR); + } + vmf->page = folio_file_page(folio, vmf->pgoff);
out_folio:
Signed-off-by: Nikita Kalyazin kalyazin@amazon.com --- fs/userfaultfd.c | 3 ++- include/uapi/linux/userfaultfd.h | 8 +++++++- 2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 97c4d71115d8..32152bfa462a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1954,7 +1954,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, uffdio_api.features = UFFD_API_FEATURES; #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR uffdio_api.features &= - ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); + ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM | + UFFD_FEATURE_MINOR_GUEST_MEMFD); #endif #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 2841e4ea8f2c..ed688797eba7 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -42,7 +42,8 @@ UFFD_FEATURE_WP_UNPOPULATED | \ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ - UFFD_FEATURE_MOVE) + UFFD_FEATURE_MOVE | \ + UFFD_FEATURE_MINOR_GUEST_MEMFD) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -230,6 +231,10 @@ struct uffdio_api { * * UFFD_FEATURE_MOVE indicates that the kernel supports moving an * existing page contents from userspace. + * + * UFFD_FEATURE_MINOR_GUEST_MEMFD indicates the same support as + * UFFD_FEATURE_MINOR_HUGETLBFS, but for guest_memfd-backed pages + * instead. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -248,6 +253,7 @@ struct uffdio_api { #define UFFD_FEATURE_POISON (1<<14) #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) +#define UFFD_FEATURE_MINOR_GUEST_MEMFD (1<<17) __u64 features;
__u64 ioctls;
The test demonstrates that a minor userfaultfd event in guest_memfd can be resolved via a memcpy followed by a UFFDIO_CONTINUE ioctl.
Signed-off-by: Nikita Kalyazin kalyazin@amazon.com --- .../testing/selftests/kvm/guest_memfd_test.c | 99 +++++++++++++++++++ 1 file changed, 99 insertions(+)
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index 38c501e49e0e..9a06c2486218 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -10,12 +10,16 @@ #include <errno.h> #include <stdio.h> #include <fcntl.h> +#include <pthread.h>
#include <linux/bitmap.h> #include <linux/falloc.h> +#include <linux/userfaultfd.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> +#include <sys/syscall.h> +#include <sys/ioctl.h>
#include "kvm_util.h" #include "test_util.h" @@ -206,6 +210,98 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm) close(fd1); }
+struct fault_args { + char *addr; + volatile char value; +}; + +static void *fault_thread_fn(void *arg) +{ + struct fault_args *args = arg; + + /* Trigger page fault */ + args->value = *args->addr; + return NULL; +} + +static void test_uffd_minor(int fd, size_t page_size, size_t total_size) +{ + struct uffdio_register uffd_reg; + struct uffdio_continue uffd_cont; + struct uffd_msg msg; + struct fault_args args; + pthread_t fault_thread; + void *mem, *mem_nofault, *buf = NULL; + int uffd, ret; + off_t offset = page_size; + void *fault_addr; + + ret = posix_memalign(&buf, page_size, total_size); + TEST_ASSERT_EQ(ret, 0); + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC); + TEST_ASSERT(uffd != -1, "userfaultfd creation should succeed"); + + struct uffdio_api uffdio_api = { + .api = UFFD_API, + .features = UFFD_FEATURE_MINOR_GUEST_MEMFD, + }; + ret = ioctl(uffd, UFFDIO_API, &uffdio_api); + TEST_ASSERT(ret != -1, "ioctl(UFFDIO_API) should succeed"); + + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(mem != MAP_FAILED, "mmap should succeed"); + + mem_nofault = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(mem_nofault != MAP_FAILED, "mmap should succeed"); + + uffd_reg.range.start = (unsigned long)mem; + uffd_reg.range.len = total_size; + uffd_reg.mode = UFFDIO_REGISTER_MODE_MINOR; + ret = ioctl(uffd, UFFDIO_REGISTER, &uffd_reg); + TEST_ASSERT(ret != -1, "ioctl(UFFDIO_REGISTER) should succeed"); + + ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + offset, page_size); + TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) should succeed"); + + fault_addr = mem + offset; + args.addr = fault_addr; + + ret = pthread_create(&fault_thread, NULL, fault_thread_fn, &args); + TEST_ASSERT(ret == 0, "pthread_create should succeed"); + + ret = read(uffd, &msg, sizeof(msg)); + TEST_ASSERT(ret != -1, "read from userfaultfd should succeed"); + TEST_ASSERT(msg.event == UFFD_EVENT_PAGEFAULT, "event type should be pagefault"); + TEST_ASSERT((void *)(msg.arg.pagefault.address & ~(page_size - 1)) == fault_addr, + "pagefault should occur at expected address"); + + memcpy(mem_nofault + offset, buf + offset, page_size); + + uffd_cont.range.start = (unsigned long)fault_addr; + uffd_cont.range.len = page_size; + uffd_cont.mode = 0; + ret = ioctl(uffd, UFFDIO_CONTINUE, &uffd_cont); + TEST_ASSERT(ret != -1, "ioctl(UFFDIO_CONTINUE) should succeed"); + + TEST_ASSERT(args.value == *(char *)(mem_nofault + offset), + "memory should contain the value that was copied"); + TEST_ASSERT(args.value == *(char *)(mem + offset), + "no further fault is expected"); + + ret = pthread_join(fault_thread, NULL); + TEST_ASSERT(ret == 0, "pthread_join should succeed"); + + ret = munmap(mem_nofault, total_size); + TEST_ASSERT(!ret, "munmap should succeed"); + + ret = munmap(mem, total_size); + TEST_ASSERT(!ret, "munmap should succeed"); + free(buf); + close(uffd); +} + unsigned long get_shared_type(void) { #ifdef __x86_64__ @@ -244,6 +340,9 @@ void test_vm_type(unsigned long type, bool is_shared) test_fallocate(fd, page_size, total_size); test_invalid_punch_hole(fd, page_size, total_size);
+ if (is_shared) + test_uffd_minor(fd, page_size, total_size); + close(fd); kvm_vm_release(vm); }
On Fri, Apr 04, 2025 at 03:43:46PM +0000, Nikita Kalyazin wrote:
This series is built on top of the Fuad's v7 "mapping guest_memfd backed memory at the host" [1].
Hm if this is based on an unmerged series this seems quite speculative and should maybe be an RFC? I mean that series at least still seems quite under discussion/experiencing issues?
Maybe worth RFC'ing until that one settles down first to avoid complexity in review/application to tree?
Thanks!
With James's KVM userfault [2], it is possible to handle stage-2 faults in guest_memfd in userspace. However, KVM itself also triggers faults in guest_memfd in some cases, for example: PV interfaces like kvmclock, PV EOI and page table walking code when fetching the MMIO instruction on x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] that KVM would be accessing those pages via userspace page tables. In order for such faults to be handled in userspace, guest_memfd needs to support userfaultfd.
Changes since v2 [4]:
- James: Fix sgp type when calling shmem_get_folio_gfp
- James: Improved vm_ops->fault() error handling
- James: Add and make use of the can_userfault() VMA operation
- James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag
- James: Fix typos and add more checks in the test
Nikita
[1] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/T... [3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAos... [4] https://lore.kernel.org/kvm/20250402160721.97596-1-kalyazin@amazon.com/T/
Nikita Kalyazin (6): mm: userfaultfd: generic continue for non hugetlbfs mm: provide can_userfault vma operation mm: userfaultfd: use can_userfault vma operation KVM: guest_memfd: add support for userfaultfd minor mm: userfaultfd: add UFFD_FEATURE_MINOR_GUEST_MEMFD KVM: selftests: test userfaultfd minor for guest_memfd
fs/userfaultfd.c | 3 +- include/linux/mm.h | 5 + include/linux/mm_types.h | 4 + include/linux/userfaultfd_k.h | 10 +- include/uapi/linux/userfaultfd.h | 8 +- mm/hugetlb.c | 9 +- mm/shmem.c | 17 +++- mm/userfaultfd.c | 47 ++++++--- .../testing/selftests/kvm/guest_memfd_test.c | 99 +++++++++++++++++++ virt/kvm/guest_memfd.c | 10 ++ 10 files changed, 188 insertions(+), 24 deletions(-)
base-commit: 3cc51efc17a2c41a480eed36b31c1773936717e0
2.47.1
On 04/04/2025 17:33, Lorenzo Stoakes wrote:
On Fri, Apr 04, 2025 at 03:43:46PM +0000, Nikita Kalyazin wrote:
This series is built on top of the Fuad's v7 "mapping guest_memfd backed memory at the host" [1].
Hm if this is based on an unmerged series this seems quite speculative and should maybe be an RFC? I mean that series at least still seems quite under discussion/experiencing issues?
Maybe worth RFC'ing until that one settles down first to avoid complexity in review/application to tree?
Hi,
I dropped the RFC tag because I saw similar examples before, but I'm happy to bring it back next time if the dependency is not merged until then.
Thanks!
Thanks!
With James's KVM userfault [2], it is possible to handle stage-2 faults in guest_memfd in userspace. However, KVM itself also triggers faults in guest_memfd in some cases, for example: PV interfaces like kvmclock, PV EOI and page table walking code when fetching the MMIO instruction on x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] that KVM would be accessing those pages via userspace page tables. In order for such faults to be handled in userspace, guest_memfd needs to support userfaultfd.
Changes since v2 [4]:
- James: Fix sgp type when calling shmem_get_folio_gfp
- James: Improved vm_ops->fault() error handling
- James: Add and make use of the can_userfault() VMA operation
- James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag
- James: Fix typos and add more checks in the test
Nikita
[1] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/T... [3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAos... [4] https://lore.kernel.org/kvm/20250402160721.97596-1-kalyazin@amazon.com/T/
Nikita Kalyazin (6): mm: userfaultfd: generic continue for non hugetlbfs mm: provide can_userfault vma operation mm: userfaultfd: use can_userfault vma operation KVM: guest_memfd: add support for userfaultfd minor mm: userfaultfd: add UFFD_FEATURE_MINOR_GUEST_MEMFD KVM: selftests: test userfaultfd minor for guest_memfd
fs/userfaultfd.c | 3 +- include/linux/mm.h | 5 + include/linux/mm_types.h | 4 + include/linux/userfaultfd_k.h | 10 +- include/uapi/linux/userfaultfd.h | 8 +- mm/hugetlb.c | 9 +- mm/shmem.c | 17 +++- mm/userfaultfd.c | 47 ++++++--- .../testing/selftests/kvm/guest_memfd_test.c | 99 +++++++++++++++++++ virt/kvm/guest_memfd.c | 10 ++ 10 files changed, 188 insertions(+), 24 deletions(-)
base-commit: 3cc51efc17a2c41a480eed36b31c1773936717e0
2.47.1
On Fri, Apr 04, 2025 at 05:56:58PM +0100, Nikita Kalyazin wrote:
On 04/04/2025 17:33, Lorenzo Stoakes wrote:
On Fri, Apr 04, 2025 at 03:43:46PM +0000, Nikita Kalyazin wrote:
This series is built on top of the Fuad's v7 "mapping guest_memfd backed memory at the host" [1].
Hm if this is based on an unmerged series this seems quite speculative and should maybe be an RFC? I mean that series at least still seems quite under discussion/experiencing issues?
Maybe worth RFC'ing until that one settles down first to avoid complexity in review/application to tree?
Hi,
I dropped the RFC tag because I saw similar examples before, but I'm happy to bring it back next time if the dependency is not merged until then.
Yeah really sorry to be a pain haha, I realise this particular situation is a bit unclear, but I think just for the sake of getting our ducks in a row and ensuring things are settled on the baseline (and it's sort of a fairly big baseline), it'd be best to bring it back!
Thanks!
Thanks!
Cheers!
With James's KVM userfault [2], it is possible to handle stage-2 faults in guest_memfd in userspace. However, KVM itself also triggers faults in guest_memfd in some cases, for example: PV interfaces like kvmclock, PV EOI and page table walking code when fetching the MMIO instruction on x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] that KVM would be accessing those pages via userspace page tables. In order for such faults to be handled in userspace, guest_memfd needs to support userfaultfd.
Changes since v2 [4]:
- James: Fix sgp type when calling shmem_get_folio_gfp
- James: Improved vm_ops->fault() error handling
- James: Add and make use of the can_userfault() VMA operation
- James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag
- James: Fix typos and add more checks in the test
Nikita
[1] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/T... [3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAos... [4] https://lore.kernel.org/kvm/20250402160721.97596-1-kalyazin@amazon.com/T/
Nikita Kalyazin (6): mm: userfaultfd: generic continue for non hugetlbfs mm: provide can_userfault vma operation mm: userfaultfd: use can_userfault vma operation KVM: guest_memfd: add support for userfaultfd minor mm: userfaultfd: add UFFD_FEATURE_MINOR_GUEST_MEMFD KVM: selftests: test userfaultfd minor for guest_memfd
fs/userfaultfd.c | 3 +- include/linux/mm.h | 5 + include/linux/mm_types.h | 4 + include/linux/userfaultfd_k.h | 10 +- include/uapi/linux/userfaultfd.h | 8 +- mm/hugetlb.c | 9 +- mm/shmem.c | 17 +++- mm/userfaultfd.c | 47 ++++++--- .../testing/selftests/kvm/guest_memfd_test.c | 99 +++++++++++++++++++ virt/kvm/guest_memfd.c | 10 ++ 10 files changed, 188 insertions(+), 24 deletions(-)
base-commit: 3cc51efc17a2c41a480eed36b31c1773936717e0
2.47.1
+To authors of v7 series referenced in [1]
* Nikita Kalyazin kalyazin@amazon.com [250404 11:44]:
This series is built on top of the Fuad's v7 "mapping guest_memfd backed memory at the host" [1].
I didn't see their addresses in the to/cc, so I added them to my response as I reference the v7 patch set below.
With James's KVM userfault [2], it is possible to handle stage-2 faults in guest_memfd in userspace. However, KVM itself also triggers faults in guest_memfd in some cases, for example: PV interfaces like kvmclock, PV EOI and page table walking code when fetching the MMIO instruction on x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] that KVM would be accessing those pages via userspace page tables.
Thanks for being open about the technical call, but it would be better to capture the reasons and not the call date. I explain why in the linking section as well.
In order for such faults to be handled in userspace, guest_memfd needs to support userfaultfd.
Changes since v2 [4]:
- James: Fix sgp type when calling shmem_get_folio_gfp
- James: Improved vm_ops->fault() error handling
- James: Add and make use of the can_userfault() VMA operation
- James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag
- James: Fix typos and add more checks in the test
Nikita
Please slow down...
This patch is at v3, the v7 patch that you are building off has lockdep issues [1] reported by one of the authors, and (sorry for sounding harsh about the v7 of that patch) the cover letter reads a bit more like an RFC than a set ready to go into linux-mm.
Maybe the lockdep issue is just a patch ordering thing or removed in a later patch set, but that's not mentioned in the discovery email?
What exactly is the goal here and the path forward for the rest of us trying to build on this once it's in mm-new/mm-unstable?
Note that mm-unstable is shared with a lot of other people through linux-next, and we are really trying to stop breaking stuff on them.
Obviously v7 cannot go in until it works with lockdep - otherwise none of us can use lockdep which is not okay.
Also, I am concerned about the amount of testing in the v7 and v3 patch sets that did not bring up a lockdep issue..
[1] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/T... [3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAos...
If there is anything we need to know about the decisions in the call and that document, can you please pull it into this change log?
I don't think anyone can ensure google will not rename docs to some other office theme tomorrow - as they famously ditch basically every name and application.
Also, most of the community does not want to go to a 17 page (and growing) spreadsheet to hunt down the facts when there is an acceptable and ideal place to document them in git. It's another barrier of entry on reviewing your code as well.
But please, don't take this suggestion as carte blanche for copying a conversation from the doc, just give us the technical reasons for your decisions as briefly as possible.
[4] https://lore.kernel.org/kvm/20250402160721.97596-1-kalyazin@amazon.com/T/
[1]. https://lore.kernel.org/all/diqz1puanquh.fsf@ackerleytng-ctop.c.googlers.com...
Thanks, Liam
On 04/04/2025 18:12, Liam R. Howlett wrote:
+To authors of v7 series referenced in [1]
- Nikita Kalyazin kalyazin@amazon.com [250404 11:44]:
This series is built on top of the Fuad's v7 "mapping guest_memfd backed memory at the host" [1].
I didn't see their addresses in the to/cc, so I added them to my response as I reference the v7 patch set below.
Hi Liam,
Thanks for the feedback and for extending the list.
With James's KVM userfault [2], it is possible to handle stage-2 faults in guest_memfd in userspace. However, KVM itself also triggers faults in guest_memfd in some cases, for example: PV interfaces like kvmclock, PV EOI and page table walking code when fetching the MMIO instruction on x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] that KVM would be accessing those pages via userspace page tables.
Thanks for being open about the technical call, but it would be better to capture the reasons and not the call date. I explain why in the linking section as well.
Thanks for bringing that up. The document mostly contains the decision itself. The main alternative considered previously was a temporary reintroduction of the pages to the direct map whenever a KVM-internal access is required. It was coming with a significant complexity of guaranteeing correctness in all cases [1]. Since the memslot structure already contains a guest memory pointer supplied by the userspace, KVM can use it directly when in the VMM or vCPU context. I will add this in the cover for the next version.
[1] https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/T/#...
In order for such faults to be handled in userspace, guest_memfd needs to support userfaultfd.
Changes since v2 [4]:
- James: Fix sgp type when calling shmem_get_folio_gfp
- James: Improved vm_ops->fault() error handling
- James: Add and make use of the can_userfault() VMA operation
- James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag
- James: Fix typos and add more checks in the test
Nikita
Please slow down...
This patch is at v3, the v7 patch that you are building off has lockdep issues [1] reported by one of the authors, and (sorry for sounding harsh about the v7 of that patch) the cover letter reads a bit more like an RFC than a set ready to go into linux-mm.
AFAIK the lockdep issue was reported on a v7 of a different change. I'm basing my series on [2] ("KVM: Mapping guest_memfd backed memory at the host for software protected VMs"), while the issue was reported on [2] ("KVM: Restricted mapping of guest_memfd at the host and arm64 support"), which is also built on top of [2]. Please correct me if I'm missing something.
The key feature that is required by my series is the ability to mmap guest_memfd when the VM type allows. My understanding is no-one is opposed to that as of now, that's why I assumed it's safe to build on top of that.
[2] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [3] https://lore.kernel.org/all/diqz1puanquh.fsf@ackerleytng-ctop.c.googlers.com...
Maybe the lockdep issue is just a patch ordering thing or removed in a later patch set, but that's not mentioned in the discovery email?
What exactly is the goal here and the path forward for the rest of us trying to build on this once it's in mm-new/mm-unstable?
Note that mm-unstable is shared with a lot of other people through linux-next, and we are really trying to stop breaking stuff on them.
Obviously v7 cannot go in until it works with lockdep - otherwise none of us can use lockdep which is not okay.
Also, I am concerned about the amount of testing in the v7 and v3 patch sets that did not bring up a lockdep issue..
[1] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/T... [3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAos...
If there is anything we need to know about the decisions in the call and that document, can you please pull it into this change log?
I don't think anyone can ensure google will not rename docs to some other office theme tomorrow - as they famously ditch basically every name and application.
Also, most of the community does not want to go to a 17 page (and growing) spreadsheet to hunt down the facts when there is an acceptable and ideal place to document them in git. It's another barrier of entry on reviewing your code as well.
But please, don't take this suggestion as carte blanche for copying a conversation from the doc, just give us the technical reasons for your decisions as briefly as possible.
[4] https://lore.kernel.org/kvm/20250402160721.97596-1-kalyazin@amazon.com/T/
[1]. https://lore.kernel.org/all/diqz1puanquh.fsf@ackerleytng-ctop.c.googlers.com...
Thanks, Liam
* Nikita Kalyazin kalyazin@amazon.com [250407 07:04]:
On 04/04/2025 18:12, Liam R. Howlett wrote:
+To authors of v7 series referenced in [1]
- Nikita Kalyazin kalyazin@amazon.com [250404 11:44]:
This series is built on top of the Fuad's v7 "mapping guest_memfd backed memory at the host" [1].
I didn't see their addresses in the to/cc, so I added them to my response as I reference the v7 patch set below.
Hi Liam,
Thanks for the feedback and for extending the list.
With James's KVM userfault [2], it is possible to handle stage-2 faults in guest_memfd in userspace. However, KVM itself also triggers faults in guest_memfd in some cases, for example: PV interfaces like kvmclock, PV EOI and page table walking code when fetching the MMIO instruction on x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] that KVM would be accessing those pages via userspace page tables.
Thanks for being open about the technical call, but it would be better to capture the reasons and not the call date. I explain why in the linking section as well.
Thanks for bringing that up. The document mostly contains the decision itself. The main alternative considered previously was a temporary reintroduction of the pages to the direct map whenever a KVM-internal access is required. It was coming with a significant complexity of guaranteeing correctness in all cases [1]. Since the memslot structure already contains a guest memory pointer supplied by the userspace, KVM can use it directly when in the VMM or vCPU context. I will add this in the cover for the next version.
Thank you.
[1] https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/T/#...
In order for such faults to be handled in userspace, guest_memfd needs to support userfaultfd.
Changes since v2 [4]:
- James: Fix sgp type when calling shmem_get_folio_gfp
- James: Improved vm_ops->fault() error handling
- James: Add and make use of the can_userfault() VMA operation
- James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag
- James: Fix typos and add more checks in the test
Nikita
Please slow down...
This patch is at v3, the v7 patch that you are building off has lockdep issues [1] reported by one of the authors, and (sorry for sounding harsh about the v7 of that patch) the cover letter reads a bit more like an RFC than a set ready to go into linux-mm.
AFAIK the lockdep issue was reported on a v7 of a different change. I'm basing my series on [2] ("KVM: Mapping guest_memfd backed memory at the host for software protected VMs"), while the issue was reported on [2] ("KVM: Restricted mapping of guest_memfd at the host and arm64 support"), which is also built on top of [2]. Please correct me if I'm missing something.
I think you messed up the numbering in your statement above.
I believe you are making the point that I messed up which patches depend on what and your code does not depend on faulty locking, which appears to be the case.
There are a few issues with the required patch set?
The key feature that is required by my series is the ability to mmap guest_memfd when the VM type allows. My understanding is no-one is opposed to that as of now, that's why I assumed it's safe to build on top of that.
[2] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [3] https://lore.kernel.org/all/diqz1puanquh.fsf@ackerleytng-ctop.c.googlers.com...
All of this is extremely confusing because the onus of figuring out what the final code will look like is put on the reviewer. As it is, we have issues with people not doing enough review of the code (due to limited time). One way to get reviews is to make the barrier of entry as low as possible.
I spent Friday going down a rabbit hole of patches referring to each other as dependencies and I gave up. It looks like I mistook one set of patches as required vs them requiring the same in-flight ones as your patches.
I am struggling to see how we can adequately support all of you given the way the patches are sent out in batches with dependencies - it is just too time consuming to sort out.
Thank you, Liam
On 07/04/2025 14:40, Liam R. Howlett wrote:
- Nikita Kalyazin kalyazin@amazon.com [250407 07:04]:
On 04/04/2025 18:12, Liam R. Howlett wrote:
+To authors of v7 series referenced in [1]
- Nikita Kalyazin kalyazin@amazon.com [250404 11:44]:
This series is built on top of the Fuad's v7 "mapping guest_memfd backed memory at the host" [1].
I didn't see their addresses in the to/cc, so I added them to my response as I reference the v7 patch set below.
Hi Liam,
Thanks for the feedback and for extending the list.
With James's KVM userfault [2], it is possible to handle stage-2 faults in guest_memfd in userspace. However, KVM itself also triggers faults in guest_memfd in some cases, for example: PV interfaces like kvmclock, PV EOI and page table walking code when fetching the MMIO instruction on x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3] that KVM would be accessing those pages via userspace page tables.
Thanks for being open about the technical call, but it would be better to capture the reasons and not the call date. I explain why in the linking section as well.
Thanks for bringing that up. The document mostly contains the decision itself. The main alternative considered previously was a temporary reintroduction of the pages to the direct map whenever a KVM-internal access is required. It was coming with a significant complexity of guaranteeing correctness in all cases [1]. Since the memslot structure already contains a guest memory pointer supplied by the userspace, KVM can use it directly when in the VMM or vCPU context. I will add this in the cover for the next version.
Thank you.
[1] https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/T/#...
In order for such faults to be handled in userspace, guest_memfd needs to support userfaultfd.
Changes since v2 [4]:
- James: Fix sgp type when calling shmem_get_folio_gfp
- James: Improved vm_ops->fault() error handling
- James: Add and make use of the can_userfault() VMA operation
- James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag
- James: Fix typos and add more checks in the test
Nikita
Please slow down...
This patch is at v3, the v7 patch that you are building off has lockdep issues [1] reported by one of the authors, and (sorry for sounding harsh about the v7 of that patch) the cover letter reads a bit more like an RFC than a set ready to go into linux-mm.
AFAIK the lockdep issue was reported on a v7 of a different change. I'm basing my series on [2] ("KVM: Mapping guest_memfd backed memory at the host for software protected VMs"), while the issue was reported on [2] ("KVM: Restricted mapping of guest_memfd at the host and arm64 support"), which is also built on top of [2]. Please correct me if I'm missing something.
I think you messed up the numbering in your statement above.
I did, in an attempt to make it "even more clear" :) Sorry about that, glad you got the intention.
I believe you are making the point that I messed up which patches depend on what and your code does not depend on faulty locking, which appears to be the case.
There are a few issues with the required patch set?
There are indeed, but not in the part this series depends on, as far as I can see.
The key feature that is required by my series is the ability to mmap guest_memfd when the VM type allows. My understanding is no-one is opposed to that as of now, that's why I assumed it's safe to build on top of that.
[2] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/ [3] https://lore.kernel.org/all/diqz1puanquh.fsf@ackerleytng-ctop.c.googlers.com...
All of this is extremely confusing because the onus of figuring out what the final code will look like is put on the reviewer. As it is, we have issues with people not doing enough review of the code (due to limited time). One way to get reviews is to make the barrier of entry as low as possible.
I spent Friday going down a rabbit hole of patches referring to each other as dependencies and I gave up. It looks like I mistook one set of patches as required vs them requiring the same in-flight ones as your patches.
I am struggling to see how we can adequately support all of you given the way the patches are sent out in batches with dependencies - it is just too time consuming to sort out.
I'm happy to do whatever I can to make the review easier. I suppose the extreme case is to wait for the dependencies to get accepted, effectively serialising submissions, but that slows the process down significantly. For example, I received very good feedback on v1 and v2 of this series and was able to address it instead of waiting for the dependency. Would including the required patches directly in the series help? My only concern is in that case the same patch will be submitted multiple times (as a part of every depending series), but if it's better, I'll be doing that instead.
Thank you, Liam
* Nikita Kalyazin kalyazin@amazon.com [250407 10:05]:
...
All of this is extremely confusing because the onus of figuring out what the final code will look like is put on the reviewer. As it is, we have issues with people not doing enough review of the code (due to limited time). One way to get reviews is to make the barrier of entry as low as possible.
I spent Friday going down a rabbit hole of patches referring to each other as dependencies and I gave up. It looks like I mistook one set of patches as required vs them requiring the same in-flight ones as your patches.
I am struggling to see how we can adequately support all of you given the way the patches are sent out in batches with dependencies - it is just too time consuming to sort out.
I'm happy to do whatever I can to make the review easier. I suppose the extreme case is to wait for the dependencies to get accepted, effectively serialising submissions, but that slows the process down significantly. For example, I received very good feedback on v1 and v2 of this series and was able to address it instead of waiting for the dependency. Would including the required patches directly in the series help? My only concern is in that case the same patch will be submitted multiple times (as a part of every depending series), but if it's better, I'll be doing that instead.
Don't resend patches that someone else is upstreaming, that'll cause other problems.
Three methods come to mind:
1. As you stated, wait for the dependencies to land. This is will mean what you are working against is well tested and won't change (and you won't have to re-spin due to an unstable base).
2. Combine them into a bigger patch set. I can then pull one patch set and look at the parts of interest to the mm side.
3. Provide a git repo with the necessary changes together.
I think 2 and 3 together should be used for the guest_memfd patches. Someone needs to be managing these to send upstream. See the discussion in another patch set on guest_memfd here [1].
As this is not based on fully upstream patches, this should be marked as RFC, imo.
Thanks, Liam
[1]. https://lore.kernel.org/all/aizia2elwspxcmfrjote5h7k5wdw2stp42slytkl5visrjvz...
On 07.04.25 16:24, Liam R. Howlett wrote:
- Nikita Kalyazin kalyazin@amazon.com [250407 10:05]:
...
All of this is extremely confusing because the onus of figuring out what the final code will look like is put on the reviewer. As it is, we have issues with people not doing enough review of the code (due to limited time). One way to get reviews is to make the barrier of entry as low as possible.
I spent Friday going down a rabbit hole of patches referring to each other as dependencies and I gave up. It looks like I mistook one set of patches as required vs them requiring the same in-flight ones as your patches.
I am struggling to see how we can adequately support all of you given the way the patches are sent out in batches with dependencies - it is just too time consuming to sort out.
I'm happy to do whatever I can to make the review easier. I suppose the extreme case is to wait for the dependencies to get accepted, effectively serialising submissions, but that slows the process down significantly. For example, I received very good feedback on v1 and v2 of this series and was able to address it instead of waiting for the dependency. Would including the required patches directly in the series help? My only concern is in that case the same patch will be submitted multiple times (as a part of every depending series), but if it's better, I'll be doing that instead.
Don't resend patches that someone else is upstreaming, that'll cause other problems.
Three methods come to mind:
- As you stated, wait for the dependencies to land. This is will mean
what you are working against is well tested and won't change (and you won't have to re-spin due to an unstable base).
- Combine them into a bigger patch set. I can then pull one patch set
and look at the parts of interest to the mm side.
- Provide a git repo with the necessary changes together.
I think 2 and 3 together should be used for the guest_memfd patches. Someone needs to be managing these to send upstream. See the discussion in another patch set on guest_memfd here [1].
The issue is that most extensions are fairly independent from each other, except that they built up on Fuad's mmap support,
Sending all together as one thing might not be the best option.
Once basic mmap support is upstream, some of the extensions (e.g., directmap removal) can go in next.
So until that is upstream, I agree that tagging the stuff that builds up on that is the right thing to do, and providing git trees is another very good idea.
I'll prioritize getting Fuad's mmap stuff reviewed. (I keep saying that, I know)
On Mon, Apr 07, 2025 at 04:46:48PM +0200, David Hildenbrand wrote:
On 07.04.25 16:24, Liam R. Howlett wrote:
- Nikita Kalyazin kalyazin@amazon.com [250407 10:05]:
...
All of this is extremely confusing because the onus of figuring out what the final code will look like is put on the reviewer. As it is, we have issues with people not doing enough review of the code (due to limited time). One way to get reviews is to make the barrier of entry as low as possible.
I spent Friday going down a rabbit hole of patches referring to each other as dependencies and I gave up. It looks like I mistook one set of patches as required vs them requiring the same in-flight ones as your patches.
I am struggling to see how we can adequately support all of you given the way the patches are sent out in batches with dependencies - it is just too time consuming to sort out.
I'm happy to do whatever I can to make the review easier. I suppose the extreme case is to wait for the dependencies to get accepted, effectively serialising submissions, but that slows the process down significantly. For example, I received very good feedback on v1 and v2 of this series and was able to address it instead of waiting for the dependency. Would including the required patches directly in the series help? My only concern is in that case the same patch will be submitted multiple times (as a part of every depending series), but if it's better, I'll be doing that instead.
Don't resend patches that someone else is upstreaming, that'll cause other problems.
Three methods come to mind:
- As you stated, wait for the dependencies to land. This is will mean
what you are working against is well tested and won't change (and you won't have to re-spin due to an unstable base).
- Combine them into a bigger patch set. I can then pull one patch set
and look at the parts of interest to the mm side.
- Provide a git repo with the necessary changes together.
I think 2 and 3 together should be used for the guest_memfd patches. Someone needs to be managing these to send upstream. See the discussion in another patch set on guest_memfd here [1].
The issue is that most extensions are fairly independent from each other, except that they built up on Fuad's mmap support,
Sending all together as one thing might not be the best option.
Once basic mmap support is upstream, some of the extensions (e.g., directmap removal) can go in next.
So until that is upstream, I agree that tagging the stuff that builds up on that is the right thing to do, and providing git trees is another very good idea.
I'll prioritize getting Fuad's mmap stuff reviewed. (I keep saying that, I know)
Which series is this? Sorry maybe lost track of this one.
-- Cheers,
David / dhildenb
On 07.04.25 17:14, Lorenzo Stoakes wrote:
On Mon, Apr 07, 2025 at 04:46:48PM +0200, David Hildenbrand wrote:
On 07.04.25 16:24, Liam R. Howlett wrote:
- Nikita Kalyazin kalyazin@amazon.com [250407 10:05]:
...
All of this is extremely confusing because the onus of figuring out what the final code will look like is put on the reviewer. As it is, we have issues with people not doing enough review of the code (due to limited time). One way to get reviews is to make the barrier of entry as low as possible.
I spent Friday going down a rabbit hole of patches referring to each other as dependencies and I gave up. It looks like I mistook one set of patches as required vs them requiring the same in-flight ones as your patches.
I am struggling to see how we can adequately support all of you given the way the patches are sent out in batches with dependencies - it is just too time consuming to sort out.
I'm happy to do whatever I can to make the review easier. I suppose the extreme case is to wait for the dependencies to get accepted, effectively serialising submissions, but that slows the process down significantly. For example, I received very good feedback on v1 and v2 of this series and was able to address it instead of waiting for the dependency. Would including the required patches directly in the series help? My only concern is in that case the same patch will be submitted multiple times (as a part of every depending series), but if it's better, I'll be doing that instead.
Don't resend patches that someone else is upstreaming, that'll cause other problems.
Three methods come to mind:
- As you stated, wait for the dependencies to land. This is will mean
what you are working against is well tested and won't change (and you won't have to re-spin due to an unstable base).
- Combine them into a bigger patch set. I can then pull one patch set
and look at the parts of interest to the mm side.
- Provide a git repo with the necessary changes together.
I think 2 and 3 together should be used for the guest_memfd patches. Someone needs to be managing these to send upstream. See the discussion in another patch set on guest_memfd here [1].
The issue is that most extensions are fairly independent from each other, except that they built up on Fuad's mmap support,
Sending all together as one thing might not be the best option.
Once basic mmap support is upstream, some of the extensions (e.g., directmap removal) can go in next.
So until that is upstream, I agree that tagging the stuff that builds up on that is the right thing to do, and providing git trees is another very good idea.
I'll prioritize getting Fuad's mmap stuff reviewed. (I keep saying that, I know)
Which series is this? Sorry maybe lost track of this one.
Heh, not your fault :)
The most important one for basic mmap support is "KVM: Mapping guest_memfd backed memory at the host for software protected VMs" [1]. Some stuff (e.g., direct map removal) should be able to make progress once that landed.
I do expect the MM-specific patch in there ("mm: Consolidate freeing of typed folios on final folio_put()") to not be included as part of that work.
[I shared the feedback from the LSF/MM session in the upstream guest_memfd call, and we decided to minimize the usage of the folio_put() callback to where absolutely required; that will simplify things and avoid issues as pointed out by Willy, which is great]
The next important one will be "[PATCH v7 0/7] KVM: Restricted mapping of guest_memfd at the host and arm64 support" [2], but I similarly expect a simplification as we try moving away from folio_put() for the "shared <-> private" page conversion case.
So I expect a v8 of [1] (and that also [2] needs to be updated).
@Fuad, please let me know if I am wrong.
[1] https://lore.kernel.org/all/20250318161823.4005529-1-tabba@google.com/T/#u [2] https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
On Mon, Apr 07, 2025 at 04:46:48PM +0200, David Hildenbrand wrote:
On 07.04.25 16:24, Liam R. Howlett wrote:
- Nikita Kalyazin kalyazin@amazon.com [250407 10:05]:
...
All of this is extremely confusing because the onus of figuring out what the final code will look like is put on the reviewer. As it is, we have issues with people not doing enough review of the code (due to limited time). One way to get reviews is to make the barrier of entry as low as possible.
I spent Friday going down a rabbit hole of patches referring to each other as dependencies and I gave up. It looks like I mistook one set of patches as required vs them requiring the same in-flight ones as your patches.
I am struggling to see how we can adequately support all of you given the way the patches are sent out in batches with dependencies - it is just too time consuming to sort out.
I'm happy to do whatever I can to make the review easier. I suppose the extreme case is to wait for the dependencies to get accepted, effectively serialising submissions, but that slows the process down significantly. For example, I received very good feedback on v1 and v2 of this series and was able to address it instead of waiting for the dependency. Would including the required patches directly in the series help? My only concern is in that case the same patch will be submitted multiple times (as a part of every depending series), but if it's better, I'll be doing that instead.
Don't resend patches that someone else is upstreaming, that'll cause other problems.
Three methods come to mind:
- As you stated, wait for the dependencies to land. This is will mean
what you are working against is well tested and won't change (and you won't have to re-spin due to an unstable base).
- Combine them into a bigger patch set. I can then pull one patch set
and look at the parts of interest to the mm side.
- Provide a git repo with the necessary changes together.
I think 2 and 3 together should be used for the guest_memfd patches. Someone needs to be managing these to send upstream. See the discussion in another patch set on guest_memfd here [1].
The issue is that most extensions are fairly independent from each other, except that they built up on Fuad's mmap support,
Sending all together as one thing might not be the best option.
Once basic mmap support is upstream, some of the extensions (e.g., directmap removal) can go in next.
So until that is upstream, I agree that tagging the stuff that builds up on that is the right thing to do, and providing git trees is another very good idea.
I'll prioritize getting Fuad's mmap stuff reviewed. (I keep saying that, I know)
Fwiw, b4 allows to specify dependencies so you can b4 shazam/am and it will pull in all prerequisite patches:
b4 prep --edit-deps Edit the series dependencies in your defined $EDITOR (or core.editor)
Christian Brauner brauner@kernel.org writes:
On Mon, Apr 07, 2025 at 04:46:48PM +0200, David Hildenbrand wrote:
<snip>
Fwiw, b4 allows to specify dependencies so you can b4 shazam/am and it will pull in all prerequisite patches:
b4 prep --edit-deps Edit the series dependencies in your defined $EDITOR (or core.editor)
Thank you for this tip! On this note, what are some good CONFIGs people always enable during development?
linux-kselftest-mirror@lists.linaro.org