Re: [PATCH] Revert "sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks"

29 Sep 2025

On Fri, Sep 26, 2025 at 08:13:09AM +0530, K Prateek Nayak wrote:
...
...
Peter: Those cfs_rq_throttled() exits in dequeue_entities() seem a
little odd, as the desired dequeue didn't really complete, but
dequeue_task_fair() will still return true indicating success - not
that too many places are checking the dequeue_task return. Is that
right?
Bah, i'm forever confused on the throttle cases there :/
...
I think for most part until now it was harmless as we couldn't pick on
a throttled hierarchy and other calls to dequeue_task(DEQUEUE_DELAYED)
would later do a:
queued = task_on_rq_queued(p);
...
if (queued)
    enqueue_task(p)


which would either lead to spuriously running a blocked task and it
would block back again, or a wakeup would properly wakeup the queued
task via ttwu_runnable() but wait_task_inactive() is interesting as
it expects the dequeue will result in a block which never happens with
throttled hierarchies. I'm impressed double dequeue doesn't result in
any major splats!
Matt, if possible can you try the patch attached below to check if the
bailout for throttled hierarchy is indeed the root cause. Thanks in
advance.
P.S. the per-task throttle in tip:sched/core would get rid of all this
but it would be good to have a fix via tip:sched/urgent to get it
backported to v6.12 LTS and the newer stable kernels.
Yes, good riddance to that code :-)
...

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8ce56a8d507f..f0a4d9d7424d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6969,6 +6969,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
   int h_nr_runnable = 0;
   struct cfs_rq *cfs_rq;
   u64 slice = 0;

int ret = 0; /* XXX: Do we care if ret is 0 vs 1 since we only check ret < 0? */

Right, we don't appear to really need that.
...
if (entity_is_task(se)) {
   	p = task_of(se);
@@ -6998,7 +6999,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 
   	/* end evaluation on encountering a throttled cfs_rq */
   	if (cfs_rq_throttled(cfs_rq))

	return 0;




	goto out;



/* Don't dequeue parent if it has other entities besides us */
   	if (cfs_rq->load.weight) {
@@ -7039,7 +7040,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 
   	/* end evaluation on encountering a throttled cfs_rq */
   	if (cfs_rq_throttled(cfs_rq))

	return 0;




	goto out;

}

sub_nr_running(rq, h_nr_queued);
@@ -7048,6 +7049,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
   if (unlikely(!was_sched_idle && sched_idle_rq(rq)))
   	rq->next_balance = jiffies;

ret = 1;

+out:
   if (p && task_delayed) {
   	WARN_ON_ONCE(!task_sleep);
   	WARN_ON_ONCE(p->on_rq != 1);
@@ -7063,7 +7066,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
   	__block_task(rq, p);
   }

return 1;


return ret;

}
So the difference is that we also do __block_task() when we get
throttled somewhere in the hierarchy. IIRC when I was looking at this, I
thought it wouldn't matter since it won't get picked anyway, on account
of the cfs_rq being blocked/detached, so who cares.
But yeah, this makes sense.
Patch logistics are going to be a pain -- .17 is closed and merge window
is open, which means Linus will have per-task throttle and /urgent don't
work no more.
At this point best we can do is a patch to stable with a note that
upstream is no longer affected due to rework or something.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] Revert "sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks"