On 6/6/22 05:39, Michal Koutný wrote:
On Fri, Jun 03, 2022 at 11:13:21AM -0700, Tadeusz Struktadeusz.struk@linaro.org wrote:
In such scenario the css_killed_work_fn will be en-queued via cgroup_apply_control_disable(cgrp)->kill_css(css), and bail out to cgroup_kn_unlock(). Then cgroup_kn_unlock() will call: cgroup_put(cgrp)->css_put(&cgrp->self), which will try to enqueue css_release_work_fn for the same css instance, causing a list_add corruption bug, as can be seen in the syzkaller report [1].
This hypothesis doesn't add up to me (I am sorry).
The kill_css(css) would be a css associated with a subsys (css.ss != NULL) whereas css_put(&cgrp->self) is a different css just for the cgroup (css.ss == NULL).
Yes, you are right. I couldn't figure it out where the extra css_put() is called from, and the only place that fitted into my theory was from the cgroup_kn_unlock() in cgroup_apply_control_disable(). After some more debugging I can see that, as you said, the cgrp->self is a different css. The offending _put() is actually called by the percpu_ref_kill_and_confirm(), as it not only calls the passed confirm_kill percpu_ref_func_t, but also it puts the refcnt iself. Because the cgroup_apply_control_disable() will loop for_each_live_descendant, and call css_kill() on all css'es, and css_killed_work_fn() will also loop and call css_put() on all parents, the css_release() will be called on the first parent prematurely, causing the BUG(). What I think should be done to balance put/get is to call css_get() for all the parents in kill_css():
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index c1e1a5c34e77..3ca61325bc4e 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5527,6 +5527,8 @@ static void css_killed_ref_fn(struct percpu_ref *ref) */ static void kill_css(struct cgroup_subsys_state *css) { + struct cgroup_subsys_state *_css = css; + lockdep_assert_held(&cgroup_mutex);
if (css->flags & CSS_DYING) @@ -5541,10 +5543,13 @@ static void kill_css(struct cgroup_subsys_state *css) css_clear_dir(css);
/* - * Killing would put the base ref, but we need to keep it alive - * until after ->css_offline(). + * Killing would put the base ref, but we need to keep it alive, + * and all its parents, until after ->css_offline(). */ - css_get(css); + do { + css_get(_css); + _css = _css->parent; + } while (_css && atomic_read(&_css->online_cnt));
/* * cgroup core guarantees that, by the time ->css_offline() is
This will be then "reverted" in css_killed_work_fn() Please let me know if it makes sense to you. I'm still testing it, but syzbot is very slow today.