The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x ae431059e75d36170a5ae6b44cc4d06d43613215 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025112009-getaway-overplay-a36a@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ae431059e75d36170a5ae6b44cc4d06d43613215 Mon Sep 17 00:00:00 2001 From: Sean Christopherson seanjc@google.com Date: Mon, 3 Nov 2025 17:12:05 -0800 Subject: [PATCH] KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying
When unbinding a memslot from a guest_memfd instance, remove the bindings even if the guest_memfd file is dying, i.e. even if its file refcount has gone to zero. If the memslot is freed before the file is fully released, nullifying the memslot side of the binding in kvm_gmem_release() will write to freed memory, as detected by syzbot+KASAN:
================================================================== BUG: KASAN: slab-use-after-free in kvm_gmem_release+0x176/0x440 virt/kvm/guest_memfd.c:353 Write of size 8 at addr ffff88807befa508 by task syz.0.17/6022
CPU: 0 UID: 0 PID: 6022 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xca/0x240 mm/kasan/report.c:482 kasan_report+0x118/0x150 mm/kasan/report.c:595 kvm_gmem_release+0x176/0x440 virt/kvm/guest_memfd.c:353 __fput+0x44c/0xa70 fs/file_table.c:468 task_work_run+0x1d4/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fbeeff8efc9 </TASK>
Allocated by task 6023: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 poison_kmalloc_redzone mm/kasan/common.c:397 [inline] __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:414 kasan_kmalloc include/linux/kasan.h:262 [inline] __kmalloc_cache_noprof+0x3e2/0x700 mm/slub.c:5758 kmalloc_noprof include/linux/slab.h:957 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] kvm_set_memory_region+0x747/0xb90 virt/kvm/kvm_main.c:2104 kvm_vm_ioctl_set_memory_region+0x6f/0xd0 virt/kvm/kvm_main.c:2154 kvm_vm_ioctl+0x957/0xc60 virt/kvm/kvm_main.c:5201 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 6023: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584 poison_slab_object mm/kasan/common.c:252 [inline] __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284 kasan_slab_free include/linux/kasan.h:234 [inline] slab_free_hook mm/slub.c:2533 [inline] slab_free mm/slub.c:6622 [inline] kfree+0x19a/0x6d0 mm/slub.c:6829 kvm_set_memory_region+0x9c4/0xb90 virt/kvm/kvm_main.c:2130 kvm_vm_ioctl_set_memory_region+0x6f/0xd0 virt/kvm/kvm_main.c:2154 kvm_vm_ioctl+0x957/0xc60 virt/kvm/kvm_main.c:5201 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Deliberately don't acquire filemap invalid lock when the file is dying as the lifecycle of f_mapping is outside the purview of KVM. Dereferencing the mapping is *probably* fine, but there's no need to invalidate anything as memslot deletion is responsible for zapping SPTEs, and the only code that can access the dying file is kvm_gmem_release(), whose core code is mutually exclusive with unbinding.
Note, the mutual exclusivity is also what makes it safe to access the bindings on a dying gmem instance. Unbinding either runs with slots_lock held, or after the last reference to the owning "struct kvm" is put, and kvm_gmem_release() nullifies the slot pointer under slots_lock, and puts its reference to the VM after that is done.
Reported-by: syzbot+2479e53d0db9b32ae2aa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68fa7a22.a70a0220.3bf6c6.008b.GAE@google.com Tested-by: syzbot+2479e53d0db9b32ae2aa@syzkaller.appspotmail.com Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Cc: stable@vger.kernel.org Cc: Hillf Danton hdanton@sina.com Reviewed-By: Vishal Annapurve vannapurve@google.com Link: https://patch.msgid.link/20251104011205.3853541-1-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index fbca8c0972da..ffadc5ee8e04 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -623,24 +623,11 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, return r; }
-void kvm_gmem_unbind(struct kvm_memory_slot *slot) +static void __kvm_gmem_unbind(struct kvm_memory_slot *slot, struct kvm_gmem *gmem) { unsigned long start = slot->gmem.pgoff; unsigned long end = start + slot->npages; - struct kvm_gmem *gmem; - struct file *file;
- /* - * Nothing to do if the underlying file was already closed (or is being - * closed right now), kvm_gmem_release() invalidates all bindings. - */ - file = kvm_gmem_get_file(slot); - if (!file) - return; - - gmem = file->private_data; - - filemap_invalidate_lock(file->f_mapping); xa_store_range(&gmem->bindings, start, end - 1, NULL, GFP_KERNEL);
/* @@ -648,6 +635,38 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot) * cannot see this memslot. */ WRITE_ONCE(slot->gmem.file, NULL); +} + +void kvm_gmem_unbind(struct kvm_memory_slot *slot) +{ + struct file *file; + + /* + * Nothing to do if the underlying file was _already_ closed, as + * kvm_gmem_release() invalidates and nullifies all bindings. + */ + if (!slot->gmem.file) + return; + + file = kvm_gmem_get_file(slot); + + /* + * However, if the file is _being_ closed, then the bindings need to be + * removed as kvm_gmem_release() might not run until after the memslot + * is freed. Note, modifying the bindings is safe even though the file + * is dying as kvm_gmem_release() nullifies slot->gmem.file under + * slots_lock, and only puts its reference to KVM after destroying all + * bindings. I.e. reaching this point means kvm_gmem_release() hasn't + * yet destroyed the bindings or freed the gmem_file, and can't do so + * until the caller drops slots_lock. + */ + if (!file) { + __kvm_gmem_unbind(slot, slot->gmem.file->private_data); + return; + } + + filemap_invalidate_lock(file->f_mapping); + __kvm_gmem_unbind(slot, file->private_data); filemap_invalidate_unlock(file->f_mapping);
fput(file);
From: Sean Christopherson seanjc@google.com
[ Upstream commit 4af18dc6a9204464db76d9771d1f40e2b46bf6ae ]
Refactor guest_memfd usage of __kvm_gmem_get_pfn() to pass the index into the guest_memfd file instead of the gfn, i.e. resolve the index based on the slot+gfn in the caller instead of in __kvm_gmem_get_pfn(). This will allow kvm_gmem_get_pfn() to retrieve and return the specific "struct page", which requires the index into the folio, without a redoing the index calculation multiple times (which isn't costly, just hard to follow).
Opportunistically add a kvm_gmem_get_index() helper to make the copy+pasted code easier to understand.
Signed-off-by: Sean Christopherson seanjc@google.com Tested-by: Dmitry Osipenko dmitry.osipenko@collabora.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Message-ID: 20241010182427.1434605-46-seanjc@google.com Stable-dep-of: ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying") Signed-off-by: Sasha Levin sashal@kernel.org --- virt/kvm/guest_memfd.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index bb062d3d24572..73e5db8ef1611 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -304,6 +304,11 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot) return get_file_active(&slot->gmem.file); }
+static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) +{ + return gfn - slot->base_gfn + slot->gmem.pgoff; +} + static struct file_operations kvm_gmem_fops = { .open = generic_file_open, .release = kvm_gmem_release, @@ -553,12 +558,11 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot) }
/* Returns a locked folio on success. */ -static struct folio * -__kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, - gfn_t gfn, kvm_pfn_t *pfn, bool *is_prepared, - int *max_order) +static struct folio *__kvm_gmem_get_pfn(struct file *file, + struct kvm_memory_slot *slot, + pgoff_t index, kvm_pfn_t *pfn, + bool *is_prepared, int *max_order) { - pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff; struct kvm_gmem *gmem = file->private_data; struct folio *folio;
@@ -594,6 +598,7 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *max_order) { + pgoff_t index = kvm_gmem_get_index(slot, gfn); struct file *file = kvm_gmem_get_file(slot); struct folio *folio; bool is_prepared = false; @@ -602,7 +607,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, if (!file) return -EFAULT;
- folio = __kvm_gmem_get_pfn(file, slot, gfn, pfn, &is_prepared, max_order); + folio = __kvm_gmem_get_pfn(file, slot, index, pfn, &is_prepared, max_order); if (IS_ERR(folio)) { r = PTR_ERR(folio); goto out; @@ -650,6 +655,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long for (i = 0; i < npages; i += (1 << max_order)) { struct folio *folio; gfn_t gfn = start_gfn + i; + pgoff_t index = kvm_gmem_get_index(slot, gfn); bool is_prepared = false; kvm_pfn_t pfn;
@@ -658,7 +664,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long break; }
- folio = __kvm_gmem_get_pfn(file, slot, gfn, &pfn, &is_prepared, &max_order); + folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &is_prepared, &max_order); if (IS_ERR(folio)) { ret = PTR_ERR(folio); break;
From: Yan Zhao yan.y.zhao@intel.com
[ Upstream commit 67b43038ce14d6b0673bdffb2052d879065c94ae ]
Remove the RCU-protected attribute from slot->gmem.file. No need to use RCU primitives rcu_assign_pointer()/synchronize_rcu() to update this pointer.
- slot->gmem.file is updated in 3 places: kvm_gmem_bind(), kvm_gmem_unbind(), kvm_gmem_release(). All of them are protected by kvm->slots_lock.
- slot->gmem.file is read in 2 paths: (1) kvm_gmem_populate kvm_gmem_get_file __kvm_gmem_get_pfn
(2) kvm_gmem_get_pfn kvm_gmem_get_file __kvm_gmem_get_pfn
Path (1) kvm_gmem_populate() requires holding kvm->slots_lock, so slot->gmem.file is protected by the kvm->slots_lock in this path.
Path (2) kvm_gmem_get_pfn() does not require holding kvm->slots_lock. However, it's also not guarded by rcu_read_lock() and rcu_read_unlock(). So synchronize_rcu() in kvm_gmem_unbind()/kvm_gmem_release() actually will not wait for the readers in kvm_gmem_get_pfn() due to lack of RCU read-side critical section.
The path (2) kvm_gmem_get_pfn() is safe without RCU protection because: a) kvm_gmem_bind() is called on a new memslot, before the memslot is visible to kvm_gmem_get_pfn(). b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot occur after the memslot is no longer visible to kvm_gmem_get_pfn(). c) get_file_active() ensures that kvm_gmem_get_pfn() will not access the stale file if kvm_gmem_release() sets it to NULL. This is because if kvm_gmem_release() occurs before kvm_gmem_get_pfn(), get_file_active() will return NULL; if get_file_active() does not return NULL, kvm_gmem_release() should not occur until after kvm_gmem_get_pfn() releases the file reference.
Signed-off-by: Yan Zhao yan.y.zhao@intel.com Message-ID: 20241104084303.29909-1-yan.y.zhao@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Stable-dep-of: ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/kvm_host.h | 7 ++++++- virt/kvm/guest_memfd.c | 34 +++++++++++++++++++++------------- 2 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2e836d44f7386..a2ebc37a29ff4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -608,7 +608,12 @@ struct kvm_memory_slot {
#ifdef CONFIG_KVM_PRIVATE_MEM struct { - struct file __rcu *file; + /* + * Writes protected by kvm->slots_lock. Acquiring a + * reference via kvm_gmem_get_file() is protected by + * either kvm->slots_lock or kvm->srcu. + */ + struct file *file; pgoff_t pgoff; } gmem; #endif diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 73e5db8ef1611..343ed456b89b0 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -261,15 +261,19 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) * dereferencing the slot for existing bindings needs to be protected * against memslot updates, specifically so that unbind doesn't race * and free the memslot (kvm_gmem_get_file() will return NULL). + * + * Since .release is called only when the reference count is zero, + * after which file_ref_get() and get_file_active() fail, + * kvm_gmem_get_pfn() cannot be using the file concurrently. + * file_ref_put() provides a full barrier, and get_file_active() the + * matching acquire barrier. */ mutex_lock(&kvm->slots_lock);
filemap_invalidate_lock(inode->i_mapping);
xa_for_each(&gmem->bindings, index, slot) - rcu_assign_pointer(slot->gmem.file, NULL); - - synchronize_rcu(); + WRITE_ONCE(slot->gmem.file, NULL);
/* * All in-flight operations are gone and new bindings can be created. @@ -298,8 +302,7 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot) /* * Do not return slot->gmem.file if it has already been closed; * there might be some time between the last fput() and when - * kvm_gmem_release() clears slot->gmem.file, and you do not - * want to spin in the meanwhile. + * kvm_gmem_release() clears slot->gmem.file. */ return get_file_active(&slot->gmem.file); } @@ -510,11 +513,11 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, }
/* - * No synchronize_rcu() needed, any in-flight readers are guaranteed to - * be see either a NULL file or this new file, no need for them to go - * away. + * memslots of flag KVM_MEM_GUEST_MEMFD are immutable to change, so + * kvm_gmem_bind() must occur on a new memslot. Because the memslot + * is not visible yet, kvm_gmem_get_pfn() is guaranteed to see the file. */ - rcu_assign_pointer(slot->gmem.file, file); + WRITE_ONCE(slot->gmem.file, file); slot->gmem.pgoff = start;
xa_store_range(&gmem->bindings, start, end - 1, slot, GFP_KERNEL); @@ -550,8 +553,12 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
filemap_invalidate_lock(file->f_mapping); xa_store_range(&gmem->bindings, start, end - 1, NULL, GFP_KERNEL); - rcu_assign_pointer(slot->gmem.file, NULL); - synchronize_rcu(); + + /* + * synchronize_srcu(&kvm->srcu) ensured that kvm_gmem_get_pfn() + * cannot see this memslot. + */ + WRITE_ONCE(slot->gmem.file, NULL); filemap_invalidate_unlock(file->f_mapping);
fput(file); @@ -563,11 +570,12 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file, pgoff_t index, kvm_pfn_t *pfn, bool *is_prepared, int *max_order) { + struct file *gmem_file = READ_ONCE(slot->gmem.file); struct kvm_gmem *gmem = file->private_data; struct folio *folio;
- if (file != slot->gmem.file) { - WARN_ON_ONCE(slot->gmem.file); + if (file != gmem_file) { + WARN_ON_ONCE(gmem_file); return ERR_PTR(-EFAULT); }
From: Sean Christopherson seanjc@google.com
[ Upstream commit ae431059e75d36170a5ae6b44cc4d06d43613215 ]
When unbinding a memslot from a guest_memfd instance, remove the bindings even if the guest_memfd file is dying, i.e. even if its file refcount has gone to zero. If the memslot is freed before the file is fully released, nullifying the memslot side of the binding in kvm_gmem_release() will write to freed memory, as detected by syzbot+KASAN:
================================================================== BUG: KASAN: slab-use-after-free in kvm_gmem_release+0x176/0x440 virt/kvm/guest_memfd.c:353 Write of size 8 at addr ffff88807befa508 by task syz.0.17/6022
CPU: 0 UID: 0 PID: 6022 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xca/0x240 mm/kasan/report.c:482 kasan_report+0x118/0x150 mm/kasan/report.c:595 kvm_gmem_release+0x176/0x440 virt/kvm/guest_memfd.c:353 __fput+0x44c/0xa70 fs/file_table.c:468 task_work_run+0x1d4/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fbeeff8efc9 </TASK>
Allocated by task 6023: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 poison_kmalloc_redzone mm/kasan/common.c:397 [inline] __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:414 kasan_kmalloc include/linux/kasan.h:262 [inline] __kmalloc_cache_noprof+0x3e2/0x700 mm/slub.c:5758 kmalloc_noprof include/linux/slab.h:957 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] kvm_set_memory_region+0x747/0xb90 virt/kvm/kvm_main.c:2104 kvm_vm_ioctl_set_memory_region+0x6f/0xd0 virt/kvm/kvm_main.c:2154 kvm_vm_ioctl+0x957/0xc60 virt/kvm/kvm_main.c:5201 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 6023: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584 poison_slab_object mm/kasan/common.c:252 [inline] __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284 kasan_slab_free include/linux/kasan.h:234 [inline] slab_free_hook mm/slub.c:2533 [inline] slab_free mm/slub.c:6622 [inline] kfree+0x19a/0x6d0 mm/slub.c:6829 kvm_set_memory_region+0x9c4/0xb90 virt/kvm/kvm_main.c:2130 kvm_vm_ioctl_set_memory_region+0x6f/0xd0 virt/kvm/kvm_main.c:2154 kvm_vm_ioctl+0x957/0xc60 virt/kvm/kvm_main.c:5201 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Deliberately don't acquire filemap invalid lock when the file is dying as the lifecycle of f_mapping is outside the purview of KVM. Dereferencing the mapping is *probably* fine, but there's no need to invalidate anything as memslot deletion is responsible for zapping SPTEs, and the only code that can access the dying file is kvm_gmem_release(), whose core code is mutually exclusive with unbinding.
Note, the mutual exclusivity is also what makes it safe to access the bindings on a dying gmem instance. Unbinding either runs with slots_lock held, or after the last reference to the owning "struct kvm" is put, and kvm_gmem_release() nullifies the slot pointer under slots_lock, and puts its reference to the VM after that is done.
Reported-by: syzbot+2479e53d0db9b32ae2aa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68fa7a22.a70a0220.3bf6c6.008b.GAE@google.com Tested-by: syzbot+2479e53d0db9b32ae2aa@syzkaller.appspotmail.com Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Cc: stable@vger.kernel.org Cc: Hillf Danton hdanton@sina.com Reviewed-By: Vishal Annapurve vannapurve@google.com Link: https://patch.msgid.link/20251104011205.3853541-1-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- virt/kvm/guest_memfd.c | 45 ++++++++++++++++++++++++++++++------------ 1 file changed, 32 insertions(+), 13 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 343ed456b89b0..e2c7e3d918406 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -534,31 +534,50 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, return r; }
-void kvm_gmem_unbind(struct kvm_memory_slot *slot) +static void __kvm_gmem_unbind(struct kvm_memory_slot *slot, struct kvm_gmem *gmem) { unsigned long start = slot->gmem.pgoff; unsigned long end = start + slot->npages; - struct kvm_gmem *gmem; + + xa_store_range(&gmem->bindings, start, end - 1, NULL, GFP_KERNEL); + + /* + * synchronize_srcu(&kvm->srcu) ensured that kvm_gmem_get_pfn() + * cannot see this memslot. + */ + WRITE_ONCE(slot->gmem.file, NULL); +} + +void kvm_gmem_unbind(struct kvm_memory_slot *slot) +{ struct file *file;
/* - * Nothing to do if the underlying file was already closed (or is being - * closed right now), kvm_gmem_release() invalidates all bindings. + * Nothing to do if the underlying file was _already_ closed, as + * kvm_gmem_release() invalidates and nullifies all bindings. */ - file = kvm_gmem_get_file(slot); - if (!file) + if (!slot->gmem.file) return;
- gmem = file->private_data; - - filemap_invalidate_lock(file->f_mapping); - xa_store_range(&gmem->bindings, start, end - 1, NULL, GFP_KERNEL); + file = kvm_gmem_get_file(slot);
/* - * synchronize_srcu(&kvm->srcu) ensured that kvm_gmem_get_pfn() - * cannot see this memslot. + * However, if the file is _being_ closed, then the bindings need to be + * removed as kvm_gmem_release() might not run until after the memslot + * is freed. Note, modifying the bindings is safe even though the file + * is dying as kvm_gmem_release() nullifies slot->gmem.file under + * slots_lock, and only puts its reference to KVM after destroying all + * bindings. I.e. reaching this point means kvm_gmem_release() hasn't + * yet destroyed the bindings or freed the gmem_file, and can't do so + * until the caller drops slots_lock. */ - WRITE_ONCE(slot->gmem.file, NULL); + if (!file) { + __kvm_gmem_unbind(slot, slot->gmem.file->private_data); + return; + } + + filemap_invalidate_lock(file->f_mapping); + __kvm_gmem_unbind(slot, file->private_data); filemap_invalidate_unlock(file->f_mapping);
fput(file);
linux-stable-mirror@lists.linaro.org