On Fri, Aug 04, 2023 at 02:59:28PM -0400, Lucas Karpinski wrote:
On Fri, Aug 04, 2023 at 12:37:16PM -0400, Johannes Weiner wrote:
On Fri, Aug 04, 2023 at 11:37:33AM -0400, Lucas Karpinski wrote:
The test allocates dcache inside a cgroup, then destroys the cgroups and then checks the sanity of numbers on the parent level. The reason it fails is because dentries are freed with an RCU delay - a debugging sleep shows that usage drops as expected shortly after.
Insert a 1s sleep after completing the cgroup creation/deletions. This should be good enough, assuming that machines running those tests are otherwise not very busy. This commit is directly inspired by Johannes over at the link below.
Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@cmpxchg.org/
Signed-off-by: Lucas Karpinski lkarpins@redhat.com
Maybe I'm missing something, but there isn't a limit set anywhere that would cause the dentries to be reclaimed and freed, no? When the subgroups are deleted, the objects are just moved to the parent. The counters inside the parent (which are hierarchical) shouldn't change.
So this seems to be a different scenario than test_kmem_basic. If the test is failing for you, I can't quite see why.
You're right, the parent inherited the counters and it should behave the same whether I'm directly removing the child or if I was moving it under another cgroup. I do see the behaviour you described on my x86_64 setup, but the wrong behaviour on my aarch64 dev. platform. I'll take a closer look, but just wanted to leave an example here of what I see.
Example of slab size pre/post sleep: slab_pre = 18164688, slab_post = 3360000
Thanks, Lucas
Looked into the failures and I do have a proposed solution, just want some feedback first. With how the kernel entry in memory.stat is updated, it takes into account all charged / uncharged pages, it looks like it makes more sense to use that single entry rather than `slab + anon + file + kernel_stack + pagetables + percpu + sock' as it would cover all utilization.
Thanks, Lucas