The patch titled Subject: hugetlb: fix hugetlb cgroup refcounting during vma split has been added to the -mm tree. Its filename is hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split.patch
This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/hugetlb-fix-hugetlb-cgroup-refcoun... and later at https://ozlabs.org/~akpm/mmotm/broken-out/hugetlb-fix-hugetlb-cgroup-refcoun...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated there every 3-4 working days
------------------------------------------------------ From: Mike Kravetz mike.kravetz@oracle.com Subject: hugetlb: fix hugetlb cgroup refcounting during vma split
Guillaume Morin reported hitting the following WARNING followed by GPF or NULL pointer deference either in cgroups_destroy or in the kill_css path.:
percpu ref (css_release) <= 0 (-1) after switching to atomic WARNING: CPU: 23 PID: 130 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x127/0x130 CPU: 23 PID: 130 Comm: ksoftirqd/23 Kdump: loaded Tainted: G O 5.10.60 #1 RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x127/0x130 Call Trace: rcu_core+0x30f/0x530 rcu_core_si+0xe/0x10 __do_softirq+0x103/0x2a2 ? sort_range+0x30/0x30 run_ksoftirqd+0x2b/0x40 smpboot_thread_fn+0x11a/0x170 kthread+0x10a/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x22/0x30
Upon further examination, it was discovered that the css structure was associated with hugetlb reservations.
For private hugetlb mappings the vma points to a reserve map that contains a pointer to the css. At mmap time, reservations are set up and a reference to the css is taken. This reference is dropped in the vma close operation; hugetlb_vm_op_close. However, if a vma is split no additional reference to the css is taken yet hugetlb_vm_op_close will be called twice for the split vma resulting in an underflow.
Fix by taking another reference in hugetlb_vm_op_open. Note that the reference is only taken for the owner of the reserve map. In the more common fork case, the pointer to the reserve map is cleared for non-owning vmas.
Link: https://lkml.kernel.org/r/20210830215015.155224-1-mike.kravetz@oracle.com Fixes: e9fe92ae0cd2 ("hugetlb_cgroup: add reservation accounting for private mappings") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Reported-by: Guillaume Morin guillaume@morinfr.org Suggested-by: Guillaume Morin guillaume@morinfr.org Tested-by: Guillaume Morin guillaume@morinfr.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
include/linux/hugetlb_cgroup.h | 12 ++++++++++++ mm/hugetlb.c | 4 +++- 2 files changed, 15 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb_cgroup.h~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split +++ a/include/linux/hugetlb_cgroup.h @@ -121,6 +121,13 @@ static inline void hugetlb_cgroup_put_rs css_put(&h_cg->css); }
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info( + struct resv_map *resv_map) +{ + if (resv_map->css) + css_get(resv_map->css); +} + extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup **ptr); extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages, @@ -199,6 +206,11 @@ static inline void hugetlb_cgroup_put_rs { }
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info( + struct resv_map *resv_map) +{ +} + static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup **ptr) { --- a/mm/hugetlb.c~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split +++ a/mm/hugetlb.c @@ -4106,8 +4106,10 @@ static void hugetlb_vm_op_open(struct vm * after this open call completes. It is therefore safe to take a * new reference here without additional locking. */ - if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) + if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { + resv_map_dup_hugetlb_cgroup_uncharge_info(resv); kref_get(&resv->refs); + } }
static void hugetlb_vm_op_close(struct vm_area_struct *vma) _
Patches currently in -mm which might be from mike.kravetz@oracle.com are
hugetlb-simplify-prep_compound_gigantic_page-ref-count-racing-code.patch hugetlb-drop-ref-count-earlier-after-page-allocation.patch hugetlb-before-freeing-hugetlb-page-set-dtor-to-appropriate-value.patch hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split.patch