On Mon, Feb 18, 2019 at 06:38:25PM +0100, Michal Hocko wrote:
On Mon 18-02-19 17:16:34, Greg KH wrote:
On Mon, Feb 18, 2019 at 10:30:44AM -0500, Rik van Riel wrote:
On Mon, 2019-02-18 at 14:43 +0100, Greg Kroah-Hartman wrote:
4.20-stable review patch. If anyone has any objections, please let me know.
From: Dave Chinner dchinner@redhat.com
commit a9a238e83fbb0df31c3b9b67003f8f9d1d1b6c96 upstream.
This reverts commit 172b06c32b9497 ("mm: slowly shrink slabs with a relatively small number of objects").
This revert will result in the slab caches of dead cgroups with a small number of remaining objects never getting reclaimed, which can be a memory leak in some configurations.
But hey, that's your tradeoff to make.
That's what is in Linus's tree. Should we somehow diverge from that?
I believe we should start working on a memcg specific solution to minimize regressions for others and start a more complex solution from there.
Can we special case dead memcgs in the slab reclaim and reclaim more aggressively?
It's probably better to start a new thread to discuss this issue (btw, doesn't LSF/MM looks like the best place to do it? I can send a proposal).
But I don't think dead cgroups are any special here. At the moment when a cgroup is deleted, associated slab objects can be perfectly used by processes in other cgroups, so we can't reclaim them. Slab objects (vfs objects first of all) are quite often shared between cgroups, we can't just ignore it.
So in order to avoid leaks we'll need to apply some artificial pressure constantly, and then it's not clear why we need to do it separately for dead and living cgroups.
So I still believe that Rik's/mine approach is the right thing to do, we just need to apply the pressure gently, including all corner cases (e.g. concurrency issues spotted by Dave).
Generally speaking, the problem occurs because the lifecycle of a slab object can be much longer than the lifecycle of the corresponding memory cgroup. And because we pin the memcg by the object, we're wasting lot of memory. Right now we allow certain amount of vfs objects to reside in the memory pretty much forever unless we have a really strong memory pressure. It's arguable fine because inodes and dentries are relatively small, but if each of them holds a 200kb+ dead memcg, it becomes very noticeable.
So we either have to apply the memory pressure more evenly (what Rik and I are proposing), or completely reparent slab objects on cgroup removal.
Thanks!