On Tue, Aug 13, 2019 at 4:54 PM Mike Kravetz mike.kravetz@oracle.com wrote:
On 8/8/19 4:13 PM, Mina Almasry wrote:
For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives in the resv_map entries, in file_region->reservation_counter.
When a file_region entry is added to the resv_map via region_add, we also charge the appropriate hugetlb_cgroup and put the pointer to that in file_region->reservation_counter. This is slightly delicate since we need to not modify the resv_map until we know that charging the reservation has succeeded. If charging doesn't succeed, we report the error to the caller, so that the kernel fails the reservation.
I wish we did not need to modify these region_() routines as they are already difficult to understand. However, I see no other way with the desired semantics.
On region_del, which is when the hugetlb memory is unreserved, we delete the file_region entry in the resv_map, but also uncharge the file_region->reservation_counter.
mm/hugetlb.c | 208 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 170 insertions(+), 38 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 235996aef6618..d76e3137110ab 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -242,8 +242,72 @@ struct file_region { struct list_head link; long from; long to; +#ifdef CONFIG_CGROUP_HUGETLB
/** On shared mappings, each reserved region appears as a struct* file_region in resv_map. These fields hold the info needed to* uncharge each reservation.*/struct page_counter *reservation_counter;unsigned long pages_per_hpage;+#endif };
+/* Must be called with resv->lock held. Calling this with dry_run == true will
- count the number of pages added but will not modify the linked list.
- */
+static long consume_regions_we_overlap_with(struct file_region *rg,
struct list_head *head, long f, long *t,struct hugetlb_cgroup *h_cg,struct hstate *h,bool dry_run)+{
long add = 0;struct file_region *trg = NULL, *nrg = NULL;/* Consume any regions we now overlap with. */nrg = rg;list_for_each_entry_safe(rg, trg, rg->link.prev, link) {if (&rg->link == head)break;if (rg->from > *t)break;/* If this area reaches higher then extend our area to* include it completely. If this is not the first area* which we intend to reuse, free it.*/if (rg->to > *t)*t = rg->to;if (rg != nrg) {/* Decrement return value by the deleted range.* Another range will span this area so that by* end of routine add will be >= zero*/add -= (rg->to - rg->from);if (!dry_run) {list_del(&rg->link);kfree(rg);Is it possible that the region struct we are deleting pointed to a reservation_counter? Perhaps even for another cgroup? Just concerned with the way regions are coalesced that we may be deleting counters.
Yep, that needs to be handled I think. Thanks for catching!
-- Mike Kravetz