Re: [PATCH 1/2] mm: cma: fix allocation may fail sometimes

16 Dec 2021

      On 16.12.21 03:54, Aisheng Dong wrote:
...
...
From: David Hildenbrand david@redhat.com
Sent: Wednesday, December 15, 2021 8:31 PM
On 15.12.21 09:02, Dong Aisheng wrote:
...
We met dma_alloc_coherent() fail sometimes when doing 8 VPU decoder
test in parallel on a MX6Q SDB board.
Error log:
cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
cma: number of available pages:
3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@3607
6+99@40477+108
...
@40852+44@41108+20@41196+108@41364+108@41620+
108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49
324+20@49388+
...
5076@49452+2304@55040+35@58141+20@58220+20@58284+
7188@58348+84@66220+7276@66452+227@74525+6371@75549=>
33161 free of
...
81920 total pages
When issue happened, we saw there were still 33161 pages (129M) free
CMA memory and a lot available free slots for 148 pages in CMA bitmap
that we want to allocate.
If dumping memory info, we found that there was also ~342M normal
memory, but only 1352K CMA memory left in buddy system while a lot of
pageblocks were isolated.
Memory info log:
Normal free:351096kB min:30000kB low:37500kB high:45000kB
reserved_highatomic:0KB
...
   active_anon:98060kB inactive_anon:98948kB active_file:60864kB

inactive_file:31776kB
...
   unevictable:0kB writepending:0kB present:1048576kB

managed:1018328kB mlocked:0kB
...
   bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB

lowmem_reserve[]: 0 0 0
Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB
(UMECI) 65*64kB (UMCI)
...
36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI)
4*2048kB (MI) 8*4096kB (EI)
...
8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
The root cause of this issue is that since commit a4efc174b382
("mm/cma.c: remove redundant cma_mutex lock"), CMA supports
concurrent
...
memory allocation. It's possible that the pageblock process A try to
alloc has already been isolated by the allocation of process B during
memory migration.
When there're multi process allocating CMA memory in parallel, it's
likely that other the remain pageblocks may have also been isolated,
then CMA alloc fail finally during the first round of scanning of the
whole available CMA bitmap.
I already raised in different context that we should most probably convert that
-EBUSY to -EAGAIN --  to differentiate an actual migration problem from a
simple "concurrent allocations that target the same MAX_ORDER -1 range".
Thanks for the info. Is there a patch under review?
No, and I was too busy for now to send it out.
...
BTW i wonder that probably makes no much difference for my patch since we may
prefer retry the next pageblock rather than busy waiting on the same isolated pageblock.
Makes sense. BUT as of now we isolate not only a pageblock but a
MAX_ORDER -1 page (e.g., 2 pageblocks on x86-64 (!) ). So you'll have
the same issue in that case.
-- 
Thanks,

David / dhildenb

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/2] mm: cma: fix allocation may fail sometimes