On Tue 07-12-21 20:08:43, Michal Koutný wrote:
On Wed, Dec 01, 2021 at 02:34:39PM +0100, Jan Kara jack@suse.cz wrote:
After some analysis we've found out that the culprit of the problem is that some task is reparented from cgroup G to the root cgroup and G is offlined.
Just sharing my interpretation for context -- (I saw this was a system using the unified cgroup hierarchy, io_cgrp_subsys_on_dfl_key was enabled) and what was observed could also have been disabling the io controller on given level -- that would also manifest similarly -- the task is migrated to parent and the former blkcg is offlined.
Yes, that's another possibility.
+static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_group *bfqg) [...]
- bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
[...]
- hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_node)
bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
Here I assume root_group is (representing) the global blkcg root and this reparenting thus skips all ancestors between the removed leaf and the root. IIUC the associated io_context would then be treated as if it was running in the root blkcg. (Admittedly, this isn't a change from this patch but it may cause some surprises if the given process runs after the operation.)
Yes, this is what happens in bfq_reparent_children() and basically preserves what BFQ was already doing for a subset of bfq queues.
Reparenting to the immediate ancestors should be safe as cgroup core should ensure children are offlined before parents. Would it make sense to you?
I suppose yes, it makes more sense to reparent just to immediate parents instead of the root of the blkcg hierarchy. Initially when developing the patch I was not sure whether parent has to be still alive but as you write it should be safe. I'll modify the patch to:
static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_group *bfqg) { struct bfq_queue *bfqq; struct hlist_node *next; struct bfq_group *parent;
parent = bfqg_parent(bfqg); if (!parent) parent = bfqd->root_group;
hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_node) bfq_bfqq_move(bfqd, bfqq, parent); }
@@ -897,38 +844,17 @@ static void bfq_pd_offline(struct blkg_policy_data *pd) [...]
* It may happen that some queues are still active
* (busy) upon group destruction (if the corresponding
* processes have been forced to terminate). We move
* all the leaf entities corresponding to these queues
* to the root_group.
This comment is removed but it seems to me it assumed that the reparented entities are only some transitional remainings of terminated tasks but they may be the processes migrated upwards with a long (IO active) life ahead.
Yes, this seemed to be a misconception of the old code...
Honza