Re: [PATCH] Revert "sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks"

26 Sep 2025

Hello John, Matt,
On 9/26/2025 5:35 AM, John Stultz wrote:
...
On Thu, Sep 25, 2025 at 6:33 AM Matt Fleming matt@readmodwrite.com wrote:
...
From: Matt Fleming mfleming@cloudflare.com
This reverts commit b7ca5743a2604156d6083b88cefacef983f3a3a6.
If we dequeue a task (task B) that was sched delayed then that task is
definitely no longer on the rq and not tracked in the rbtree.
Unfortunately, task_on_rq_queued(B) will still return true because
dequeue_task() doesn't update p->on_rq.
Hey!
  Sorry again my patch has been causing you trouble. Thanks for your
persistence in chasing this down!
It's confusing as this patch uses the similar logic as logic
pick_next_entity() uses when a sched_delayed task is picked to run, as
well as elsewhere in __sched_setscheduler() and in sched_ext, so I'd
fret that similar
And my impression was that dequeue_task() on a sched_delayed task
would update p->on_rq via calling __block_task() at the end of
dequeue_entities().
However, there are two spots where we might exit dequeue_entities()
early when cfs_rq_throttled(rq), so maybe that's what's catching us
here?
That could very likely be it.
...
Peter: Those cfs_rq_throttled() exits in dequeue_entities() seem a
little odd, as the desired dequeue didn't really complete, but
dequeue_task_fair() will still return true indicating success - not
that too many places are checking the dequeue_task return. Is that
right?
I think for most part until now it was harmless as we couldn't pick on
a throttled hierarchy and other calls to dequeue_task(DEQUEUE_DELAYED)
would later do a:
queued = task_on_rq_queued(p);
    ...
    if (queued)
        enqueue_task(p)
which would either lead to spuriously running a blocked task and it
would block back again, or a wakeup would properly wakeup the queued
task via ttwu_runnable() but wait_task_inactive() is interesting as
it expects the dequeue will result in a block which never happens with
throttled hierarchies. I'm impressed double dequeue doesn't result in
any major splats!
Matt, if possible can you try the patch attached below to check if the
bailout for throttled hierarchy is indeed the root cause. Thanks in
advance.
P.S. the per-task throttle in tip:sched/core would get rid of all this
but it would be good to have a fix via tip:sched/urgent to get it
backported to v6.12 LTS and the newer stable kernels.
---
Thanks and Regards,
Prateek
(Prepared on top of tip:sched/urgent but should apply fine on v6.17-rc7)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8ce56a8d507f..f0a4d9d7424d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6969,6 +6969,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
    int h_nr_runnable = 0;
    struct cfs_rq *cfs_rq;
    u64 slice = 0;
+	int ret = 0; /* XXX: Do we care if ret is 0 vs 1 since we only check ret < 0? */
if (entity_is_task(se)) {
    	p = task_of(se);
@@ -6998,7 +6999,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
/* end evaluation on encountering a throttled cfs_rq */
    	if (cfs_rq_throttled(cfs_rq))
-			return 0;
+			goto out;
/* Don't dequeue parent if it has other entities besides us */
    	if (cfs_rq->load.weight) {
@@ -7039,7 +7040,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
/* end evaluation on encountering a throttled cfs_rq */
    	if (cfs_rq_throttled(cfs_rq))
-			return 0;
+			goto out;
    }
sub_nr_running(rq, h_nr_queued);
@@ -7048,6 +7049,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
    if (unlikely(!was_sched_idle && sched_idle_rq(rq)))
    	rq->next_balance = jiffies;
+	ret = 1;
+out:
    if (p && task_delayed) {
    	WARN_ON_ONCE(!task_sleep);
    	WARN_ON_ONCE(p->on_rq != 1);
@@ -7063,7 +7066,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
    	__block_task(rq, p);
    }
-	return 1;
+	return ret;
 }
/*

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] Revert "sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks"