On Mon 20-09-21 23:38:40, Vishnu Rangayyan wrote:
Processes inside a memcg that get core dumped when there is less memory available in the memcg can have the core dumping interrupted by the oom-killer. We saw this with qemu processes inside a memcg, as in this trace below. The memcg was not out of memory when the core dump was triggered.
Why is it important to mention that the the memcg was not oom when the dump was triggered?
[201169.028782] qemu-kata-syste invoked oom-killer: gfp_mask=0x101c4a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE|__GFP_WRITE), order=0, oom_score_adj=-100
[...]
[201169.028863] memory: usage 12218368kB, limit 12218368kB, failcnt 1728013
it obviously is for the particular allocation from the core dumping code.
[201169.028864] memory+swap: usage 12218368kB, limit 9007199254740988kB, failcnt 0 [201169.028864] kmem: usage 154424kB, limit 9007199254740988kB, failcnt 0 [201169.028880] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=podacfa3d53-2068-4b61-a754-fa21968b4201,mems_allowed=0-1,oom_memcg=/kubepods/burstable/podacfa3d53-2068-4b61-a754-fa21968b4201,task_memcg=/kubepods/burstable/podacfa3d53-2068-4b61-a754-fa21968b4201,task=qemu-kata-syste,pid=1887079,uid=0 [201169.028888] Memory cgroup out of memory: Killed process 1887079 (qemu-kata-syste) total-vm:13598556kB, anon-rss:39836kB, file-rss:8712kB, shmem-rss:12017992kB, UID:0 pgtables:24204kB oom_score_adj:-100 [201169.045201] oom_reaper: reaped process 1887079 (qemu-kata-syste), now anon-rss:0kB, file-rss:28kB, shmem-rss:12018016kB
This change adds an fsync only for regular file core dumps based on a configurable limit core_sync_bytes placed alongside other core dump params and defaults the limit to (an arbitrary value) of 128KB. Setting core_sync_bytes to zero disables the sync.
This doesn't really explain neither the problem nor the solution. Why is fsync helping at all? Why do we need a new sysctl to address the problem and how does it help to prevent the memcg OOM. Also why is this a problem in the first place.
Have a look at the oom report. It says that only 8MB of the 11GB limit is consumed by the file backed memory. The absolute majority (98%) is sitting in the shmem and fsync will not help a wee bit there.