On Wed, Dec 11, 2024 at 04:32:50AM +0000, Matthew Wilcox (Oracle) wrote:
Today we account each page individually to the memcg, which works well enough, if a little inefficiently (N atomic operations per page instead of N per allocation). Unfortunately, the stats can get out of sync when i915 calls vmap() with VM_MAP_PUT_PAGES. The pages being passed were not allocated by vmalloc, so the MEMCG_VMALLOC counter was never incremented. But it is decremented when the pages are freed with vfree().
Solve all of this by tracking the memcg at the vm_struct level. This logic has to live in the memcontrol file as it calls several functions which are currently static.
Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap) Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org
include/linux/memcontrol.h | 7 ++++++ include/linux/vmalloc.h | 3 +++ mm/memcontrol.c | 46 ++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 14 ++++++------ 4 files changed, 63 insertions(+), 7 deletions(-)
This would work, but it seems somewhat complicated. The atomics in memcg charging and the vmstat updates are batched, and the per-page overhead is for the most part cheap per-cpu ops. Not an issue per se.
You could do for MEMCG_VMALLOC what you did for nr_vmalloc_pages:
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 634162271c00..a889bb04405c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3353,7 +3353,11 @@ void vfree(const void *addr) struct page *page = vm->pages[i];
BUG_ON(!page); - mod_memcg_page_state(page, MEMCG_VMALLOC, -1); + + /* Pages were allocated elsewhere */ + if (!(vm->flags & VM_MAP_PUT_PAGES)) + mod_memcg_page_state(page, MEMCG_VMALLOC, -1); + /* * High-order allocs for huge vmallocs are split, so * can be freed as an array of order-0 allocations