From: Paolo Bonzini pbonzini@redhat.com
commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.
In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens stevensd@google.com Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [OP: backport to 4.19, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1ecb27b3421a..6aeac96bf147 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1495,9 +1495,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, kvm_pfn_t *p_pfn) { unsigned long pfn; + pte_t *ptep; + spinlock_t *ptl; int r;
- r = follow_pfn(vma, addr, &pfn); + r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) { /* * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does @@ -1512,14 +1514,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, if (r) return r;
- r = follow_pfn(vma, addr, &pfn); + r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) return r; + }
+ if (write_fault && !pte_write(*ptep)) { + pfn = KVM_PFN_ERR_RO_FAULT; + goto out; }
if (writable) - *writable = true; + *writable = pte_write(*ptep); + pfn = pte_pfn(*ptep);
/* * Get a reference here because callers of *hva_to_pfn* and @@ -1534,6 +1541,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, */ kvm_get_pfn(pfn);
+out: + pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; return 0; }
From: Nicholas Piggin npiggin@gmail.com
commit f8be156be163a052a067306417cd0ff679068c97 upstream.
It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly.
Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page).
This addresses CVE-2021-22543.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Tested-by: Paolo Bonzini pbonzini@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- virt/kvm/kvm_main.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6aeac96bf147..3559eba5f502 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1489,6 +1489,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; }
+static int kvm_try_get_pfn(kvm_pfn_t pfn) +{ + if (kvm_is_reserved_pfn(pfn)) + return 1; + return get_page_unless_zero(pfn_to_page(pfn)); +} + static int hva_to_pfn_remapped(struct vm_area_struct *vma, unsigned long addr, bool *async, bool write_fault, bool *writable, @@ -1538,13 +1545,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * Whoever called remap_pfn_range is also going to call e.g. * unmap_mapping_range before the underlying pages are freed, * causing a call to our MMU notifier. + * + * Certain IO or PFNMAP mappings can be backed with valid + * struct pages, but be allocated without refcounting e.g., + * tail pages of non-compound higher order allocations, which + * would then underflow the refcount when the caller does the + * required put_page. Don't allow those pages here. */ - kvm_get_pfn(pfn); + if (!kvm_try_get_pfn(pfn)) + r = -EFAULT;
out: pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; - return 0; + + return r; }
/*
On 23/07/21 22:11, Ovidiu Panait wrote:
From: Nicholas Piggin npiggin@gmail.com
commit f8be156be163a052a067306417cd0ff679068c97 upstream.
It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly.
Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page).
This addresses CVE-2021-22543.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Tested-by: Paolo Bonzini pbonzini@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com
virt/kvm/kvm_main.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6aeac96bf147..3559eba5f502 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1489,6 +1489,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; } +static int kvm_try_get_pfn(kvm_pfn_t pfn) +{
- if (kvm_is_reserved_pfn(pfn))
return 1;
- return get_page_unless_zero(pfn_to_page(pfn));
+}
- static int hva_to_pfn_remapped(struct vm_area_struct *vma, unsigned long addr, bool *async, bool write_fault, bool *writable,
@@ -1538,13 +1545,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * Whoever called remap_pfn_range is also going to call e.g. * unmap_mapping_range before the underlying pages are freed, * causing a call to our MMU notifier.
*
* Certain IO or PFNMAP mappings can be backed with valid
* struct pages, but be allocated without refcounting e.g.,
* tail pages of non-compound higher order allocations, which
* would then underflow the refcount when the caller does the
*/* required put_page. Don't allow those pages here.
- kvm_get_pfn(pfn);
- if (!kvm_try_get_pfn(pfn))
r = -EFAULT;
out: pte_unmap_unlock(ptep, ptl); *p_pfn = pfn;
- return 0;
- return r; }
/*
Acked-by: Paolo Bonzini pbonzini@redhat.com
On 23/07/21 22:11, Ovidiu Panait wrote:
From: Paolo Bonzini pbonzini@redhat.com
commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.
In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens stevensd@google.com Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [OP: backport to 4.19, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com
virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1ecb27b3421a..6aeac96bf147 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1495,9 +1495,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, kvm_pfn_t *p_pfn) { unsigned long pfn;
- pte_t *ptep;
- spinlock_t *ptl; int r;
- r = follow_pfn(vma, addr, &pfn);
- r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) { /*
- get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
@@ -1512,14 +1514,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, if (r) return r;
r = follow_pfn(vma, addr, &pfn);
if (r) return r;r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl);
- }
- if (write_fault && !pte_write(*ptep)) {
pfn = KVM_PFN_ERR_RO_FAULT;
}goto out;
if (writable)
*writable = true;
*writable = pte_write(*ptep);
- pfn = pte_pfn(*ptep);
/* * Get a reference here because callers of *hva_to_pfn* and @@ -1534,6 +1541,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, */ kvm_get_pfn(pfn); +out:
- pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; return 0; }
Acked-by: Paolo Bonzini pbonzini@redhat.com
On Fri, Jul 23, 2021 at 11:11:33PM +0300, Ovidiu Panait wrote:
From: Paolo Bonzini pbonzini@redhat.com
commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.
Both now queued up, thanks!
greg k-h
linux-stable-mirror@lists.linaro.org