On 17.11.20 19:19, Minchan Kim wrote:
There is a need for special HW to require bulk allocation of high-order pages. For example, 4800 * order-4 pages, which would be minimum, sometimes, it requires more.
To meet the requirement, a option reserves 300M CMA area and requests the whole 300M contiguous memory. However, it doesn't work if even one of those pages in the range is long-term pinned directly or indirectly. The other option is to ask higher-order size (e.g., 2M) than requested order(64K) repeatedly until driver could gather necessary amount of memory. Basically, this approach makes the allocation very slow due to cma_alloc's function slowness and it could be stuck on one of the pageblocks if it encounters unmigratable page.
To solve the issue, this patch introduces cma_alloc_bulk.
int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, unsigned int order, size_t nr_requests, struct page **page_array, size_t *nr_allocated);
Most parameters are same with cma_alloc but it additionally passes vector array to store allocated memory. What's different with cma_alloc is it will skip pageblocks without waiting/stopping if it has unmovable page so that API continues to scan other pageblocks to find requested order page.
cma_alloc_bulk is best effort approach in that it skips some pageblocks if they have unmovable pages unlike cma_alloc. It doesn't need to be perfect from the beginning at the cost of performance. Thus, the API takes gfp_t to support __GFP_NORETRY which is propagated into alloc_contig_page to avoid significat overhead functions to inrecase CMA allocation success ratio(e.g., migration retrial, PCP, LRU draining per pageblock) at the cost of less allocation success ratio. If the caller couldn't allocate enough pages with __GFP_NORETRY, they could call it without __GFP_NORETRY to increase success ratio this time if they are okay to expense the overhead for the success ratio.
I'm not a friend of connecting __GFP_NORETRY to PCP and LRU draining. Also, gfp flags apply mostly to compaction (e.g., how to allocate free pages for migration), so this seems a little wrong.
Can we instead introduce
enum alloc_contig_mode { /* * Normal mode: * * Retry page migration 5 times, ... TBD * */ ALLOC_CONTIG_NORMAL = 0, /* * Fast mode: e.g., used for bulk allocations. * * Don't retry page migration if it fails, don't drain PCP * lists, don't drain LRU. */ ALLOC_CONTIG_FAST, };
To be extended by ALLOC_CONTIG_HARD in the future to be used e.g., by virtio-mem (disable PCP, retry a couple of times more often ) ...