On 08/12/24 at 06:16pm, Will Deacon wrote:
Commit 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block") extended the 'vmap_block' structure to contain a 'cpu' field which is set at allocation time to the id of the initialising CPU.
When a new 'vmap_block' is being instantiated by new_vmap_block(), the partially initialised structure is added to the local 'vmap_block_queue' xarray before the 'cpu' field has been initialised. If another CPU is concurrently walking the xarray (e.g. via vm_unmap_aliases()), then it may perform an out-of-bounds access to the remote queue thanks to an uninitialised index.
This has been observed as UBSAN errors in Android:
| Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP | | Call trace: | purge_fragmented_block+0x204/0x21c | _vm_unmap_aliases+0x170/0x378 | vm_unmap_aliases+0x1c/0x28 | change_memory_common+0x1dc/0x26c | set_memory_ro+0x18/0x24 | module_enable_ro+0x98/0x238 | do_init_module+0x1b0/0x310
Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the addition to the xarray.
Cc: Zhaoyang Huang zhaoyang.huang@unisoc.com Cc: Hailong.Liu hailong.liu@oppo.com Cc: Uladzislau Rezki (Sony) urezki@gmail.com Cc: Baoquan He bhe@redhat.com Cc: Christoph Hellwig hch@infradead.org Cc: Lorenzo Stoakes lstoakes@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Andrew Morton akpm@linux-foundation.org Cc: stable@vger.kernel.org Fixes: 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block") Signed-off-by: Will Deacon will@kernel.org
Good catch, this truly could happen and collapse system.
Reviewed-by: Baoquan He bhe@redhat.com
I _think_ the insertion into the free list is ok, as the vb shouldn't be considered for purging if it's clean. It would be great if somebody more familiar with this code could confirm either way, however.
It's OK, please see below comment.
static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) { ...... vaddr = vmap_block_vaddr(va->va_start, 0); spin_lock_init(&vb->lock); vb->va = va; /* At least something should be left free */ BUG_ON(VMAP_BBMAP_BITS <= (1UL << order)); bitmap_zero(vb->used_map, VMAP_BBMAP_BITS); vb->free = VMAP_BBMAP_BITS - (1UL << order); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here we have cut away one piece according to vb_alloc() and set vb->free. vb->dirty = 0; vb->dirty_min = VMAP_BBMAP_BITS; vb->dirty_max = 0; bitmap_set(vb->used_map, 0, (1UL << order)); INIT_LIST_HEAD(&vb->free_list); ... }
static bool purge_fragmented_block(struct vmap_block *vb, struct list_head *purge_list, bool force_purge) { struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, vb->cpu);
if (vb->free + vb->dirty != VMAP_BBMAP_BITS || ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The above setting of vb->free and vb->dirty will fail conditional check here. So it won't be purged. vb->dirty == VMAP_BBMAP_BITS) return false;
/* Don't overeagerly purge usable blocks unless requested */ if (!(force_purge || vb->free < VMAP_PURGE_THRESHOLD)) return false; ... }
mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..64c0a2c8a73c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2626,6 +2626,7 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) vb->dirty_max = 0; bitmap_set(vb->used_map, 0, (1UL << order)); INIT_LIST_HEAD(&vb->free_list);
- vb->cpu = raw_smp_processor_id();
xa = addr_to_vb_xa(va->va_start); vb_idx = addr_to_vb_idx(va->va_start); @@ -2642,7 +2643,6 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) * integrity together with list_for_each_rcu from read * side. */
- vb->cpu = raw_smp_processor_id(); vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu); spin_lock(&vbq->lock); list_add_tail_rcu(&vb->free_list, &vbq->free);
-- 2.46.0.76.ge559c4bf1a-goog