On 5/25/23 10:11, Michal Koutný wrote:
On Thu, May 25, 2023 at 12:35:18PM +0800, Ming Lei ming.lei@redhat.com wrote:
It is less a problem if the cgroup to be destroyed also has other controllers like memory that will call cgroup_rstat_flush() which will clean up the reference count. If block is the only controller that uses rstat, these offline blkcg and blkgs may never be freed leaking more and more memory over time.
On v2, io implies memory too. Do you observe the leak on the v2 system too?
(Beware that (not only) dirty pages would pin offlined memcg, so the actual mem_cgroup_css_release and cgroup_rstat_flush would be further delayed.)
The memory leak isn't observed on cgroup v2 as the cgroup_rstat_flush() call by the memory controller will help to flush the extra references holding back blkcgs. It is only happening with a cgroup v1 configuration where blkcg is standalone in a cgroup.
To prevent this potential memory leak:
- flush blkcg per-cpu stats list in __blkg_release(), when no new stat
can be added
- add global blkg_stat_lock for covering concurrent parent blkg stat
update
It's bit unfortunate yet another lock is added :-/
IIUC, even Waiman's patch (flush in blkcg_destroy_blkcfs) would need synchronization for different CPU replicas flushes in blkcg_iostat_update, right?
Right, the goal is to prevent concurrent call of blkcg_iostat_update() to the same blkg.
Cheers, Longman