On Wed, Dec 01, 2021 at 02:34:39PM +0100, Jan Kara jack@suse.cz wrote:
After some analysis we've found out that the culprit of the problem is that some task is reparented from cgroup G to the root cgroup and G is offlined.
Just sharing my interpretation for context -- (I saw this was a system using the unified cgroup hierarchy, io_cgrp_subsys_on_dfl_key was enabled) and what was observed could also have been disabling the io controller on given level -- that would also manifest similarly -- the task is migrated to parent and the former blkcg is offlined.
+static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_group *bfqg) [...]
- bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
[...]
- hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_node)
bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
Here I assume root_group is (representing) the global blkcg root and this reparenting thus skips all ancestors between the removed leaf and the root. IIUC the associated io_context would then be treated as if it was running in the root blkcg. (Admittedly, this isn't a change from this patch but it may cause some surprises if the given process runs after the operation.)
Reparenting to the immediate ancestors should be safe as cgroup core should ensure children are offlined before parents. Would it make sense to you?
@@ -897,38 +844,17 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) [...]
* It may happen that some queues are still active
* (busy) upon group destruction (if the corresponding
* processes have been forced to terminate). We move
* all the leaf entities corresponding to these queues
* to the root_group.
This comment is removed but it seems to me it assumed that the reparented entities are only some transitional remainings of terminated tasks but they may be the processes migrated upwards with a long (IO active) life ahead.