On 9/26/19 5:55 PM, Mina Almasry wrote:
Provided we keep the existing controller untouched, should the new controller track:
- only reservations, or
- both reservations and allocations for which no reservations exist
(such as the MAP_NORESERVE case)?
I like the 'both' approach. Seems to me a counter like that would work automatically regardless of whether the application is allocating hugetlb memory with NORESERVE or not. NORESERVE allocations cannot cut into reserved hugetlb pages, correct?
Correct. One other easy way to allocate huge pages without reserves (that I know is used today) is via the fallocate system call.
If so, then applications that
allocate with NORESERVE will get sigbused when they hit their limit, and applications that allocate without NORESERVE may get an error at mmap time but will always be within their limits while they access the mmap'd memory, correct?
Correct. At page allocation time we can easily check to see if a reservation exists and not charge. For any specific page within a hugetlbfs file, a charge would happen at mmap time or allocation time.
One exception (that I can think of) to this mmap(RESERVE) will not cause a SIGBUS rule is in the case of hole punch. If someone punches a hole in a file, not only do they remove pages associated with the file but the reservation information as well. Therefore, a subsequent fault will be the same as an allocation without reservation.
I 'think' the code to remove/truncate a file will work corrctly as it is today, but I need to think about this some more.
mmap'd memory, correct? So the 'both' counter seems like a one size fits all.
I think the only sticking point left is whether an added controller can support both cgroup-v2 and cgroup-v1. If I could get confirmation on that I'll provide a patchset.
Sorry, but I can not provide cgroup expertise.