The mmap read lock is used during the shrinker's callback, which means that using alloc->vma pointer isn't safe as it can race with munmap(). As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") the mmap lock is downgraded after the vma has been isolated.
I was able to reproduce this issue by manually adding some delays and triggering page reclaiming through the shrinker's debug sysfs. The following KASAN report confirms the UAF:
================================================================== BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8 Read of size 8 at addr ffff356ed50e50f0 by task bash/478
CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70 Hardware name: linux,dummy-virt (DT) Call trace: zap_page_range_single+0x470/0x4b8 binder_alloc_free_page+0x608/0xadc __list_lru_walk_one+0x130/0x3b0 list_lru_walk_node+0xc4/0x22c binder_shrink_scan+0x108/0x1dc shrinker_debugfs_scan_write+0x2b4/0x500 full_proxy_write+0xd4/0x140 vfs_write+0x1ac/0x758 ksys_write+0xf0/0x1dc __arm64_sys_write+0x6c/0x9c
Allocated by task 492: kmem_cache_alloc+0x130/0x368 vm_area_alloc+0x2c/0x190 mmap_region+0x258/0x18bc do_mmap+0x694/0xa60 vm_mmap_pgoff+0x170/0x29c ksys_mmap_pgoff+0x290/0x3a0 __arm64_sys_mmap+0xcc/0x144
Freed by task 491: kmem_cache_free+0x17c/0x3c8 vm_area_free_rcu_cb+0x74/0x98 rcu_core+0xa38/0x26d4 rcu_core_si+0x10/0x1c __do_softirq+0x2fc/0xd24
Last potentially related work creation: __call_rcu_common.constprop.0+0x6c/0xba0 call_rcu+0x10/0x1c vm_area_free+0x18/0x24 remove_vma+0xe4/0x118 do_vmi_align_munmap.isra.0+0x718/0xb5c do_vmi_munmap+0xdc/0x1fc __vm_munmap+0x10c/0x278 __arm64_sys_munmap+0x58/0x7c
Fix this issue by performing instead a vma_lookup() which will fail to find the vma that was isolated before the mmap lock downgrade. Note that this option has better performance than upgrading to a mmap write lock which would increase contention. Plus, mmap_write_trylock() has been recently removed anyway.
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Cc: stable@vger.kernel.org Cc: Liam Howlett liam.howlett@oracle.com Cc: Minchan Kim minchan@kernel.org Signed-off-by: Carlos Llamas cmllamas@google.com --- drivers/android/binder_alloc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index e3db8297095a..c4d60d81221b 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item, goto err_mmget; if (!mmap_read_trylock(mm)) goto err_mmap_read_lock_failed; - vma = binder_alloc_get_vma(alloc); + vma = vma_lookup(mm, page_addr); + if (vma && vma != binder_alloc_get_vma(alloc)) + goto err_invalid_vma;
list_lru_isolate(lru, item); spin_unlock(lock); @@ -1031,6 +1033,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item, mutex_unlock(&alloc->mutex); return LRU_REMOVED_RETRY;
+err_invalid_vma: + mmap_read_unlock(mm); err_mmap_read_lock_failed: mmput_async(mm); err_mmget:
* Carlos Llamas cmllamas@google.com [231102 15:00]:
The mmap read lock is used during the shrinker's callback, which means that using alloc->vma pointer isn't safe as it can race with munmap().
I think you know my feelings about the safety of that pointer from previous discussions.
As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") the mmap lock is downgraded after the vma has been isolated.
I was able to reproduce this issue by manually adding some delays and triggering page reclaiming through the shrinker's debug sysfs. The following KASAN report confirms the UAF:
================================================================== BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8 Read of size 8 at addr ffff356ed50e50f0 by task bash/478
CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70 Hardware name: linux,dummy-virt (DT) Call trace: zap_page_range_single+0x470/0x4b8 binder_alloc_free_page+0x608/0xadc __list_lru_walk_one+0x130/0x3b0 list_lru_walk_node+0xc4/0x22c binder_shrink_scan+0x108/0x1dc shrinker_debugfs_scan_write+0x2b4/0x500 full_proxy_write+0xd4/0x140 vfs_write+0x1ac/0x758 ksys_write+0xf0/0x1dc __arm64_sys_write+0x6c/0x9c
Allocated by task 492: kmem_cache_alloc+0x130/0x368 vm_area_alloc+0x2c/0x190 mmap_region+0x258/0x18bc do_mmap+0x694/0xa60 vm_mmap_pgoff+0x170/0x29c ksys_mmap_pgoff+0x290/0x3a0 __arm64_sys_mmap+0xcc/0x144
Freed by task 491: kmem_cache_free+0x17c/0x3c8 vm_area_free_rcu_cb+0x74/0x98 rcu_core+0xa38/0x26d4 rcu_core_si+0x10/0x1c __do_softirq+0x2fc/0xd24
Last potentially related work creation: __call_rcu_common.constprop.0+0x6c/0xba0 call_rcu+0x10/0x1c vm_area_free+0x18/0x24 remove_vma+0xe4/0x118 do_vmi_align_munmap.isra.0+0x718/0xb5c do_vmi_munmap+0xdc/0x1fc __vm_munmap+0x10c/0x278 __arm64_sys_munmap+0x58/0x7c
Fix this issue by performing instead a vma_lookup() which will fail to find the vma that was isolated before the mmap lock downgrade. Note that this option has better performance than upgrading to a mmap write lock which would increase contention. Plus, mmap_write_trylock() has been recently removed anyway.
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Cc: stable@vger.kernel.org Cc: Liam Howlett liam.howlett@oracle.com Cc: Minchan Kim minchan@kernel.org Signed-off-by: Carlos Llamas cmllamas@google.com
drivers/android/binder_alloc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index e3db8297095a..c4d60d81221b 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item, goto err_mmget; if (!mmap_read_trylock(mm)) goto err_mmap_read_lock_failed;
- vma = binder_alloc_get_vma(alloc);
- vma = vma_lookup(mm, page_addr);
- if (vma && vma != binder_alloc_get_vma(alloc))
goto err_invalid_vma;
Doesn't this need to be: if (!vma || vma != binder_alloc_get_vma(alloc))
This way, we catch a different vma and a NULL vma.
Or even, just: if (vma != binder_alloc_get_vma(alloc))
if the alloc vma cannot be NULL?
list_lru_isolate(lru, item); spin_unlock(lock); @@ -1031,6 +1033,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item, mutex_unlock(&alloc->mutex); return LRU_REMOVED_RETRY; +err_invalid_vma:
- mmap_read_unlock(mm);
err_mmap_read_lock_failed: mmput_async(mm); err_mmget: -- 2.42.0.869.gea05f2083d-goog
On Thu, Nov 02, 2023 at 03:20:51PM -0400, Liam R. Howlett wrote:
- Carlos Llamas cmllamas@google.com [231102 15:00]:
The mmap read lock is used during the shrinker's callback, which means that using alloc->vma pointer isn't safe as it can race with munmap().
I think you know my feelings about the safety of that pointer from previous discussions.
Yeah. The work here is not done. We actually already store the vm_start address in alloc->buffer, so in theory we don't even need to swap the alloc->vma pointer we could just drop it. So, I agree with you.
I want to include this saftey "fix" along with some other work that uses the page fault handler and get_user_pages_remote(). I've tried a quick prototype of this and it works fine.
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index e3db8297095a..c4d60d81221b 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item, goto err_mmget; if (!mmap_read_trylock(mm)) goto err_mmap_read_lock_failed;
- vma = binder_alloc_get_vma(alloc);
- vma = vma_lookup(mm, page_addr);
- if (vma && vma != binder_alloc_get_vma(alloc))
goto err_invalid_vma;
Doesn't this need to be: if (!vma || vma != binder_alloc_get_vma(alloc))
This way, we catch a different vma and a NULL vma.
Or even, just: if (vma != binder_alloc_get_vma(alloc))
if the alloc vma cannot be NULL?
If the vma_lookup() is NULL then we still need to isolate and free the given binder page and we obviously skip the zap() in this case.
However, if we receive a random unexpected vma because of a corrupted address or similar, then the whole process is skipped.
Thus, why we use the check above.
-- Carlos Llamas
* Carlos Llamas cmllamas@google.com [231102 16:09]:
...
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index e3db8297095a..c4d60d81221b 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item, goto err_mmget; if (!mmap_read_trylock(mm)) goto err_mmap_read_lock_failed;
- vma = binder_alloc_get_vma(alloc);
- vma = vma_lookup(mm, page_addr);
- if (vma && vma != binder_alloc_get_vma(alloc))
goto err_invalid_vma;
Doesn't this need to be: if (!vma || vma != binder_alloc_get_vma(alloc))
This way, we catch a different vma and a NULL vma.
Or even, just: if (vma != binder_alloc_get_vma(alloc))
if the alloc vma cannot be NULL?
If the vma_lookup() is NULL then we still need to isolate and free the given binder page and we obviously skip the zap() in this case.
I would have thought if there was no VMA, then the entire process could be avoided. Thanks for clarifying.
However, if we receive a random unexpected vma because of a corrupted address or similar, then the whole process is skipped.
Thus, why we use the check above.
-- Carlos Llamas
Carlos Llamas cmllamas@google.com writes:
The mmap read lock is used during the shrinker's callback, which means that using alloc->vma pointer isn't safe as it can race with munmap(). As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") the mmap lock is downgraded after the vma has been isolated.
I was able to reproduce this issue by manually adding some delays and triggering page reclaiming through the shrinker's debug sysfs. The following KASAN report confirms the UAF:
[...snip...]
Fix this issue by performing instead a vma_lookup() which will fail to find the vma that was isolated before the mmap lock downgrade. Note that this option has better performance than upgrading to a mmap write lock which would increase contention. Plus, mmap_write_trylock() has been recently removed anyway.
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Cc: stable@vger.kernel.org Cc: Liam Howlett liam.howlett@oracle.com Cc: Minchan Kim minchan@kernel.org Signed-off-by: Carlos Llamas cmllamas@google.com
This change makes sense to me, and I agree that the code still needs to run when the vma is null.
Reviewed-by: Alice Ryhl aliceryhl@google.com
linux-stable-mirror@lists.linaro.org