On 01/12/2025 15:54, David Hildenbrand (Red Hat) wrote:
On 12/1/25 14:39, Nikita Kalyazin wrote:
On 30/11/2025 11:18, Mike Rapoport wrote:
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
userfaultfd notifications about minor page faults used for live migration and snapshotting of VMs with memory backed by shared hugetlbfs or tmpfs mappings as described in detail in commit 7677f7fd8be7 ("userfaultfd: add minor fault registration mode").
To use the same mechanism for VMs that use guest_memfd to map their memory, guest_memfd should support userfaultfd minor mode.
Extend ->fault() method of guest_memfd with ability to notify core page fault handler that a page fault requires handle_userfault(VM_UFFD_MINOR) to complete and add implementation of ->get_folio_noalloc() to guest_memfd vm_ops.
Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org
virt/kvm/guest_memfd.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index ffadc5ee8e04..dca6e373937b 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,6 +4,7 @@ #include <linux/kvm_host.h> #include <linux/pagemap.h> #include <linux/anon_inodes.h> +#include <linux/userfaultfd_k.h>
#include "kvm_mm.h"
@@ -359,7 +360,15 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_INIT_SHARED)) return VM_FAULT_SIGBUS;
- folio = kvm_gmem_get_folio(inode, vmf->pgoff); + folio = filemap_lock_folio(inode->i_mapping, vmf->pgoff); + if (!IS_ERR_OR_NULL(folio) && userfaultfd_minor(vmf->vma)) { + ret = VM_FAULT_UFFD_MINOR; + goto out_folio; + }
I realised that I might have been wrong in [1] saying that the noalloc get folio was ok for our use case. Unfortunately we rely on a minor fault to get generated even when the page is being allocated. Peter and I discussed it originally in [2]. Since we want to populate guest memory with the content supplied by userspace on demand, we have to be able to intercept the very first access, meaning we either need a minor or major UFFD event for that. We decided to make use of the minor at the time. If we have to preserve the shmem semantics, it forces us to implement support for major/UFFDIO_COPY.
If we want missing semantics then likely we should be adding ... missing support? :)
I believe I found the precise point where we convinced ourselves that minor support was sufficient: [1]. If at this moment we don't find that reasoning valid anymore, then indeed implementing missing is the only option.
[1] https://lore.kernel.org/kvm/Z9GsIDVYWoV8d8-C@x1.local
-- Cheers
David