The patch titled Subject: Re: dma/pool: do not complain if DMA pool is not allocated has been added to the -mm mm-hotfixes-unstable branch. Its filename is dma-pool-do-not-complain-if-dma-pool-is-not-allocated.patch
This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches...
This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days
------------------------------------------------------ From: Michal Hocko mhocko@suse.com Subject: Re: dma/pool: do not complain if DMA pool is not allocated Date: Tue, 9 Aug 2022 17:37:59 +0200
We have a system complaining about order-10 allocation for the DMA pool.
[ 14.017417][ T1] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-7 [ 14.017429][ T1] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.14.21-150400.22-default #1 SLE15-SP4 0b6a6578ade2de5c4a0b916095dff44f76ef1704 [ 14.017434][ T1] Hardware name: XXXX [ 14.017437][ T1] Call Trace: [ 14.017444][ T1] <TASK> [ 14.017449][ T1] dump_stack_lvl+0x45/0x57 [ 14.017469][ T1] warn_alloc+0xfe/0x160 [ 14.017490][ T1] __alloc_pages_slowpath.constprop.112+0xc27/0xc60 [ 14.017497][ T1] ? rdinit_setup+0x2b/0x2b [ 14.017509][ T1] ? rdinit_setup+0x2b/0x2b [ 14.017512][ T1] __alloc_pages+0x2d5/0x320 [ 14.017517][ T1] alloc_page_interleave+0xf/0x70 [ 14.017531][ T1] atomic_pool_expand+0x4a/0x200 [ 14.017541][ T1] ? rdinit_setup+0x2b/0x2b [ 14.017544][ T1] __dma_atomic_pool_init+0x44/0x90 [ 14.017556][ T1] dma_atomic_pool_init+0xad/0x13f [ 14.017560][ T1] ? __dma_atomic_pool_init+0x90/0x90 [ 14.017562][ T1] do_one_initcall+0x41/0x200 [ 14.017581][ T1] kernel_init_freeable+0x236/0x298 [ 14.017589][ T1] ? rest_init+0xd0/0xd0 [ 14.017596][ T1] kernel_init+0x16/0x120 [ 14.017599][ T1] ret_from_fork+0x22/0x30 [ 14.017604][ T1] </TASK> [...] [ 14.018026][ T1] Node 0 DMA free:160kB boost:0kB min:0kB low:0kB high:0kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 14.018035][ T1] lowmem_reserve[]: 0 0 0 0 0 [ 14.018339][ T1] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 0*64kB 1*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 160kB
The usable memory in the DMA zone is obviously too small for the pool pre-allocation. The allocation failure raises concern by admins because this is considered an error state.
In fact the preallocation itself doesn't expose any actual problem. It is not even clear whether anybody is ever going to use this pool. If yes then a warning will be triggered anyway.
Silence the warning to prevent confusion and bug reports.
Link: https://lkml.kernel.org/r/YvJ/V2bor9Q3P6ov@dhcp22.suse.cz
Signed-off-by: Michal Hocko mhocko@suse.com Cc: Baoquan He bhe@redhat.com Cc: Christoph Hellwig hch@lst.de Cc: John Donnelly john.p.donnelly@oracle.com Cc: David Hildenbrand david@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
kernel/dma/pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/dma/pool.c~dma-pool-do-not-complain-if-dma-pool-is-not-allocated +++ a/kernel/dma/pool.c @@ -205,7 +205,7 @@ static int __init dma_atomic_pool_init(v ret = -ENOMEM; if (has_managed_dma()) { atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size, - GFP_KERNEL | GFP_DMA); + GFP_KERNEL | GFP_DMA | __GFP_NOWARN); if (!atomic_pool_dma) ret = -ENOMEM; } _
Patches currently in -mm which might be from mhocko@suse.com are
dma-pool-do-not-complain-if-dma-pool-is-not-allocated.patch
This is really material for the dma-mapping tree (besides the obvious coding style problem). I'll try to unwind through the thread ASAP once I've got more burning issues of my plate and will see it it otherwise makes sense and pick it up in the proper tree if so.
On Wed 10-08-22 16:00:30, Christoph Hellwig wrote:
This is really material for the dma-mapping tree (besides the obvious coding style problem).
Let me know if I should refresh the patch. I guess you are referring to 80 chars per line here.
I'll try to unwind through the thread ASAP once I've got more burning issues of my plate and will see it it otherwise makes sense and pick it up in the proper tree if so.
Thanks.
This is what I think should solve your problem properly:
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-pool-siz...
On Thu 11-08-22 11:29:11, Christoph Hellwig wrote:
This is what I think should solve your problem properly:
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-pool-siz...
http://git.infradead.org/users/hch/misc.git/commit/a29d77929bf4fc563657b29da...
+unsigned long __init nr_pages_per_zone(int zone_index) +{ + return arch_zone_highest_possible_pfn[zone_index] - + arch_zone_lowest_possible_pfn[zone_index]; +} +
this will not consider any gaps in the zone. We have zone_managed_pages which tells you how many pages there are in the zone which have been provided to the page allocator which seems like something that would fit better. And it is also much better than basing it on the global amount of memory which just doesn't make much sens for constrained zones
Except that it will not work for this case as +static unsigned long calculate_pool_size(unsigned long zone_pages) +{ + unsigned long nr_pages = min_t(unsigned long, + zone_pages / (SZ_1G / SZ_128K), + MAX_ORDER_NR_PAGES); + + return max_t(unsigned long, nr_pages << PAGE_SHIFT, SZ_128K); +}
this will return 128kB, correct?
+ atomic_pool_dma = __dma_atomic_pool_init( + calculate_pool_size(nr_zone_dma_pages), + GFP_DMA);
The DMA zone still has 126kB of usable memory. I think what you want/need to do something like pool_size = calculate_pool_size(nr_zone_dma_pages); do { atomic_pool_dma = __dma_atomic_pool_init(pool_size), GFP_DMA | __GFP_NOWARN); if (atomic_pool_dma) { break; pool_size /= 2; } while (pool_size > PAGE_SZIE); if (!atomic_pool_dma) print_something_useful;
Another option would be to consider NR_FREE_PAGES of the zone but that would be inherently racy.
On Thu, Aug 11, 2022 at 11:51:19AM +0200, Michal Hocko wrote:
this will not consider any gaps in the zone. We have zone_managed_pages which tells you how many pages there are in the zone which have been provided to the page allocator which seems like something that would fit better. And it is also much better than basing it on the global amount of memory which just doesn't make much sens for constrained zones
Yes, that is a much better helper.
Except that it will not work for this case as +static unsigned long calculate_pool_size(unsigned long zone_pages) +{
unsigned long nr_pages = min_t(unsigned long,
zone_pages / (SZ_1G / SZ_128K),
MAX_ORDER_NR_PAGES);
return max_t(unsigned long, nr_pages << PAGE_SHIFT, SZ_128K);
+}
this will return 128kB, correct?
Yes.
The DMA zone still has 126kB of usable memory. I think what you want/need to do something like
No. If you don't have 128k free you have another bug and we really should warn here. Please go back to that system and figure out what uses up almost all of ZONE_DMA on that system, because we have another big problem there.
On Sat 13-08-22 08:29:13, Christoph Hellwig wrote:
On Thu, Aug 11, 2022 at 11:51:19AM +0200, Michal Hocko wrote:
this will not consider any gaps in the zone. We have zone_managed_pages which tells you how many pages there are in the zone which have been provided to the page allocator which seems like something that would fit better. And it is also much better than basing it on the global amount of memory which just doesn't make much sens for constrained zones
Yes, that is a much better helper.
Except that it will not work for this case as +static unsigned long calculate_pool_size(unsigned long zone_pages) +{
unsigned long nr_pages = min_t(unsigned long,
zone_pages / (SZ_1G / SZ_128K),
MAX_ORDER_NR_PAGES);
return max_t(unsigned long, nr_pages << PAGE_SHIFT, SZ_128K);
+}
this will return 128kB, correct?
Yes.
The DMA zone still has 126kB of usable memory. I think what you want/need to do something like
No. If you don't have 128k free you have another bug and we really should warn here. Please go back to that system and figure out what uses up almost all of ZONE_DMA on that system, because we have another big problem there.
I do agree that such a large consumption from the DMA zone is unusual. I have reached out to the original reporter and ask for help to investigate more. Let's see whether we can get something out of that. This can really reveal some DMA zone abuser.
Anyway, you seem to be not thrilled about the __GFP_NOWARN approach and I won't push it. But is the existing inconsistency really desirable? I mean we can get pretty vocal warning if the allocation fails but no information when the zone doesn't have any managed memory. Why should we treat them differently?
On Mon, Aug 15, 2022 at 10:42:01AM +0200, Michal Hocko wrote:
Anyway, you seem to be not thrilled about the __GFP_NOWARN approach and I won't push it. But is the existing inconsistency really desirable? I mean we can get pretty vocal warning if the allocation fails but no information when the zone doesn't have any managed memory. Why should we treat them differently?
How could we end up having ZONE_DMA without any managed memory to start with except for the case where the total memory is smaller than what fits into ZONE_DMA? If we have such a case we really should warn about it as well.
On Wed 17-08-22 08:00:45, Christoph Hellwig wrote:
On Mon, Aug 15, 2022 at 10:42:01AM +0200, Michal Hocko wrote:
Anyway, you seem to be not thrilled about the __GFP_NOWARN approach and I won't push it. But is the existing inconsistency really desirable? I mean we can get pretty vocal warning if the allocation fails but no information when the zone doesn't have any managed memory. Why should we treat them differently?
How could we end up having ZONE_DMA without any managed memory to start with except for the case where the total memory is smaller than what fits into ZONE_DMA? If we have such a case we really should warn about it as well.
This can be an early memory reservation from this physical address range. My original report http://lkml.kernel.org/r/Yj28gjonUa9+0yae@dhcp22.suse.cz was referring to such a system (a different one than what I am dealing with now): present:636kB managed:0kB
There is only 636kB present in that ZONE_DMA physical range but nothing has made it to the page allocator in the end.
I noticed this is still in -mm. As state before any change to kernel/dma/pool.c should go through the dma-mapping tree, AND this patch is not correct. Please drop it.
On Sat, 13 Aug 2022 08:33:29 +0200 Christoph Hellwig hch@lst.de wrote:
I noticed this is still in -mm. As state before any change to kernel/dma/pool.c should go through the dma-mapping tree, AND this patch is not correct. Please drop it.
I hang onto such a patch until seeing confirmation that it (or an alternative) has been merged elsewhere. So it doesn't get lost.
I'll assume (without such confirmation) that this issue will be addressed via the dma-mapping tree.
linux-stable-mirror@lists.linaro.org