On 7/15/2025 9:25 PM, Muhammad Usama Anjum wrote:
When there is memory pressure, at resume time dma_alloc_coherent() returns error which in turn fails the loading of firmware and hence the driver crashes:
kernel: kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 kernel: CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1 3843143b92e9da0fa2d3d5f21f51beaed15c7d59 kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 kernel: Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] kernel: Call Trace: kernel: <TASK> kernel: dump_stack_lvl+0x4e/0x70 kernel: warn_alloc+0x164/0x190 kernel: ? srso_return_thunk+0x5/0x5f kernel: ? __alloc_pages_direct_compact+0xaf/0x360 kernel: __alloc_pages_slowpath.constprop.0+0xc75/0xd70 kernel: __alloc_pages_noprof+0x321/0x350 kernel: __dma_direct_alloc_pages.isra.0+0x14a/0x290 kernel: dma_direct_alloc+0x70/0x270 kernel: mhi_fw_load_handler+0x126/0x340 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf] kernel: mhi_pm_st_worker+0x5e8/0xac0 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf] kernel: ? srso_return_thunk+0x5/0x5f kernel: process_one_work+0x17e/0x330 kernel: worker_thread+0x2ce/0x3f0 kernel: ? __pfx_worker_thread+0x10/0x10 kernel: kthread+0xd2/0x100 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork+0x34/0x50 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork_asm+0x1a/0x30 kernel: </TASK> kernel: Mem-Info: kernel: active_anon:513809 inactive_anon:152 isolated_anon:0 active_file:359315 inactive_file:2487001 isolated_file:0 unevictable:637 dirty:19 writeback:0 slab_reclaimable:160391 slab_unreclaimable:39729 mapped:175836 shmem:51039 pagetables:4415 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:125666 free_pcp:0 free_cma:0
In above example, if we sum all the consumed memory, it comes out to be 15.5GB and free memory is ~ 500MB from a total of 16GB RAM. Even though memory is present. But all of the dma memory has been exhausted or fragmented.
Fix it by allocating it only once and then reuse the same allocated memory. As we'll allocate this memory only once, this memory will stay allocated.
BHI buffer is not needed anymore after initial firmware loaded. So IMO we can not keep it just for the purpose of avoiding OOM in the future.