Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift.
Re-order the MMU cache top-up to earlier in user_mem_abort() so that it is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid inducing spurious fault retries).
This bug has existed since KVM/ARM's inception. It's unlikely that any sane userspace currently modifies VMAs in such a way as to trigger this race. And even with directed testing I was unable to reproduce it. But a sufficiently motivated host userspace might be able to exploit this race.
Fixes: 94f8e6418d39 ("KVM: ARM: Handle guest faults in KVM") Cc: stable@vger.kernel.org Reported-by: Sean Christopherson seanjc@google.com Signed-off-by: David Matlack dmatlack@google.com --- arch/arm64/kvm/mmu.c | 48 +++++++++++++++++++------------------------- 1 file changed, 21 insertions(+), 27 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7113587222ff..f54408355d1d 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1217,6 +1217,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; }
+ /* + * Permission faults just need to update the existing leaf entry, + * and so normally don't require allocations from the memcache. The + * only exception to this is when dirty logging is enabled at runtime + * and a write fault needs to collapse a block entry into a table. + */ + if (fault_status != ESR_ELx_FSC_PERM || + (logging_active && write_fault)) { + ret = kvm_mmu_topup_memory_cache(memcache, + kvm_mmu_cache_min_pages(kvm)); + if (ret) + return ret; + } + /* * Let's check if we will get back a huge page backed by hugetlbfs, or * get block mapping for device MMIO region. @@ -1269,37 +1283,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, fault_ipa &= ~(vma_pagesize - 1);
gfn = fault_ipa >> PAGE_SHIFT; - mmap_read_unlock(current->mm); - - /* - * Permission faults just need to update the existing leaf entry, - * and so normally don't require allocations from the memcache. The - * only exception to this is when dirty logging is enabled at runtime - * and a write fault needs to collapse a block entry into a table. - */ - if (fault_status != ESR_ELx_FSC_PERM || - (logging_active && write_fault)) { - ret = kvm_mmu_topup_memory_cache(memcache, - kvm_mmu_cache_min_pages(kvm)); - if (ret) - return ret; - }
- mmu_seq = vcpu->kvm->mmu_invalidate_seq; /* - * Ensure the read of mmu_invalidate_seq happens before we call - * gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk - * the page we just got a reference to gets unmapped before we have a - * chance to grab the mmu_lock, which ensure that if the page gets - * unmapped afterwards, the call to kvm_unmap_gfn will take it away - * from us again properly. This smp_rmb() interacts with the smp_wmb() - * in kvm_mmu_notifier_invalidate_<page|range_end>. + * Read mmu_invalidate_seq so that KVM can detect if the results of + * vma_lookup() or __gfn_to_pfn_memslot() become stale prior to + * acquiring kvm->mmu_lock. * - * Besides, __gfn_to_pfn_memslot() instead of gfn_to_pfn_prot() is - * used to avoid unnecessary overhead introduced to locate the memory - * slot because it's always fixed even @gfn is adjusted for huge pages. + * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs + * with the smp_wmb() in kvm_mmu_invalidate_end(). */ - smp_rmb(); + mmu_seq = vcpu->kvm->mmu_invalidate_seq; + mmap_read_unlock(current->mm);
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, write_fault, &writable, NULL);
base-commit: 96a4627dbbd48144a65af936b321701c70876026
[Dropping Christoffer's 11 year obsolete address...]
On Mon, 13 Mar 2023 23:54:54 +0000, David Matlack dmatlack@google.com wrote:
Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift.
Re-order the MMU cache top-up to earlier in user_mem_abort() so that it is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid inducing spurious fault retries).
This bug has existed since KVM/ARM's inception. It's unlikely that any sane userspace currently modifies VMAs in such a way as to trigger this race. And even with directed testing I was unable to reproduce it. But a sufficiently motivated host userspace might be able to exploit this race.
Fixes: 94f8e6418d39 ("KVM: ARM: Handle guest faults in KVM")
Ah, good luck with that one! :D user_mem_abort() used to be so nice and simple at the time! And yet...
Cc: stable@vger.kernel.org Reported-by: Sean Christopherson seanjc@google.com Signed-off-by: David Matlack dmatlack@google.com
Reviewed-by: Marc Zyngier maz@kernel.org
Oliver, how do you want to deal with this one? queue it right now? Or wait until the dust settles on my two other patches?
I don't mind either way, I can either take it as part of the same series, or rebase my stuff on it.
Thanks,
M.
On Tue, Mar 14, 2023 at 04:31:38PM +0000, Marc Zyngier wrote:
[Dropping Christoffer's 11 year obsolete address...]
On Mon, 13 Mar 2023 23:54:54 +0000, David Matlack dmatlack@google.com wrote:
Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift.
Re-order the MMU cache top-up to earlier in user_mem_abort() so that it is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid inducing spurious fault retries).
This bug has existed since KVM/ARM's inception. It's unlikely that any sane userspace currently modifies VMAs in such a way as to trigger this race. And even with directed testing I was unable to reproduce it. But a sufficiently motivated host userspace might be able to exploit this race.
Fixes: 94f8e6418d39 ("KVM: ARM: Handle guest faults in KVM")
Ah, good luck with that one! :D user_mem_abort() used to be so nice and simple at the time! And yet...
Cc: stable@vger.kernel.org Reported-by: Sean Christopherson seanjc@google.com Signed-off-by: David Matlack dmatlack@google.com
Reviewed-by: Marc Zyngier maz@kernel.org
Oliver, how do you want to deal with this one? queue it right now? Or wait until the dust settles on my two other patches?
I don't mind either way, I can either take it as part of the same series, or rebase my stuff on it.
I'll go ahead and grab it if you want to base your series on top of this, thanks both of you!
On Mon, 13 Mar 2023 16:54:54 -0700, David Matlack wrote:
Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift.
[...]
Applied to kvmarm/fixes, thanks!
[1/1] KVM: arm64: Retry fault if vma_lookup() results become invalid https://git.kernel.org/kvmarm/kvmarm/c/13ec9308a857
-- Best, Oliver
linux-stable-mirror@lists.linaro.org