In linux stable kernel (tested on 4.14), reading memory.stat in case of tens of thousands of ghost cgroups pinned by lingering page cache takes up to 100 ms ~ 700 ms to complete the reading.
Repro steps (tested on 4.14 kernel):
$ cat /tmp/make_zombies
mkdir /tmp/fs mount -t tmpfs nodev /tmp/fs for i in {1..10000}; do mkdir /sys/fs/cgroup/memory/z$i (echo $BASHPID >> /sys/fs/cgroup/memory/z$i/cgroup.procs && echo $i
/tmp/fs/$i)
done
# establish baseline $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null 0.011642670 seconds time elapsed
$ bash /tmp/make_zombies $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null 0.134939281 seconds time elapsed
$ rmdir /sys/fs/cgroup/memory/z* $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null 0.135323145 seconds time elapsed # even after rmdir we have zombies, so still slow.
The fix is already present in linux master (since 4.16) by following commits:
c9019e9bf42e66d028d70d2da6206cad4dd9250d mm: memcontrol: eliminate raw access to stat and event counters 284542656e22c43fdada8c8cc0ca9ede8453eed7 mm: memcontrol: implement lruvec stat functions on top of each other a983b5ebee57209c99f68c8327072f25e0e6e3da mm: memcontrol: fix excessive complexity in memory.stat reporting c3cc39118c3610eb6ab4711bc624af7fc48a35fe mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats e27be240df53f1a20c659168e722b5d9f16cc7f4 mm: memcg: make sure memory.events is uptodate when waking pollers
I would like to request cherry-picking the above commits to linux-stable branch - 4.14.
Thanks, Vaibhav