On Thu, May 25, 2023 at 04:11:34PM +0200, Michal Koutný wrote:
On Thu, May 25, 2023 at 12:35:18PM +0800, Ming Lei ming.lei@redhat.com wrote:
It is less a problem if the cgroup to be destroyed also has other controllers like memory that will call cgroup_rstat_flush() which will clean up the reference count. If block is the only controller that uses rstat, these offline blkcg and blkgs may never be freed leaking more and more memory over time.
On v2, io implies memory too. Do you observe the leak on the v2 system too?
blkg leak is only observed on v1.
(Beware that (not only) dirty pages would pin offlined memcg, so the actual mem_cgroup_css_release and cgroup_rstat_flush would be further delayed.)
To prevent this potential memory leak:
- flush blkcg per-cpu stats list in __blkg_release(), when no new stat
can be added
- add global blkg_stat_lock for covering concurrent parent blkg stat
update
It's bit unfortunate yet another lock is added :-/
Cause it should be the simplest one for backport, :-)
IIUC, even Waiman's patch (flush in blkcg_destroy_blkcfs) would need synchronization for different CPU replicas flushes in blkcg_iostat_update, right?
- don't grab bio->bi_blkg reference when adding the stats into blkcg's
per-cpu stat list since all stats are guaranteed to be consumed before releasing blkg instance, and grabbing blkg reference for stats was the most fragile part of original patch
At one moment, the lhead -> blkcg_gq reference seemed alright to me and
But bio->bi_blkg has grabbed one reference in blk_cgroup_bio_start already.
consequently blkcg_gq -> blkcg is the one that looks reversed (forming
IMO, this one is correct, cause blkcg_gq depends on blkcg.
the cycle). But changing its direction would be much more fundamental change, it'd need also kind of blkcg_gq reparenting -- similarly to memcg.
Thanks, Ming