Re: [PATCH] sched/fair: Fix forked task check in vruntime_normalized

29 Mar 2024

On Thu, Mar 28, 2024 at 8:21 AM mingyang.cui mingyang.cui@horizon.ai wrote:
...
When rt_mutex_setprio changes a task's scheduling class to RT,
sometimes the task's vruntime is not updated correctly upon
return to the fair class.
Specifically, the following is being observed:

task has just been created and running for a short time
task sleep while still in the fair class
task is boosted to RT via rt_mutex_setprio, which changes
the task to RT and calls check_class_changed.
check_class_changed leads to detach_task_cfs_rq, at which point
the vruntime_normalized check sees that the task's sum_exec_runtime
is zero, which results in skipping the subtraction of the
rq's min_vruntime from the task's vruntime
later, when the prio is deboosted and the task is moved back
to the fair class, the fair rq's min_vruntime is added to
the task's vruntime, even though it wasn't subtracted earlier.

Just to make sure I am following: since sum_exec_runtime is updated in
update_curr(), if the task goes to sleep, shouldn't it be dequeued and
thus update_curr() would have been run (and thus sum_exec_runtime
would be non-zero)?
Maybe in this analysis is the new task being boosted while it is still
newly running (instead of sleeping)?
...
Since the task's vruntime is about double that of other tasks in cfs_rq,
the task to be unable to run for a long time when there are continuous
runnable tasks in cfs_rq.
The immediate result is inflation of the task's vruntime, giving
it lower priority (starving it if there's enough available work).
The longer-term effect is inflation of all vruntimes because the
task's vruntime becomes the rq's min_vruntime when the higher
priority tasks go idle. That leads to a vicious cycle, where
the vruntime inflation repeatedly doubled.
This is an interesting find! I'm curious how you detected the problem,
as it might make a good correctness test (which I'm selfishly looking
for more of myself :)
...
The root cause of the problem is that the vruntime_normalized made a
misjudgment. Since the sum_exec_runtime of some tasks that were just
created and run for a short time is zero, the vruntime_normalized
mistakenly thinks that they are tasks that have just been forked.
Therefore, sum_exec_runtime is not subtracted from the vruntime of the
task.
So, we fix this bug by adding a check condition for newly forked task.
Signed-off-by: mingyang.cui mingyang.cui@horizon.ai
kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 73a89fbd81be..3d0c14f3731f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11112,7 +11112,7 @@ static inline bool vruntime_normalized(struct task_struct *p)
         * - A task which has been woken up by try_to_wake_up() and
         *   waiting for actually being woken up by sched_ttwu_pending().
         */

  if (!se->sum_exec_runtime ||




  if (!se->sum_exec_runtime && p->state == TASK_NEW ||
      (p->state == TASK_WAKING && p->sched_remote_wakeup))
          return true;



This looks like it was applied against an older tree? The p->state
accesses should be under a READ_ONCE (and likely consolidated before
the conditional?)
Also, I wonder if alternatively a call to update_curr() if
(cfs_rq->curr == se) in switched_from_fair() would be good (ie:
normalize it on the boost to close the edge case rather than handle it
as an expected non-normalized condition)?
thanks
-john

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] sched/fair: Fix forked task check in vruntime_normalized

Signed-off-by: mingyang.cui mingyang.cui@horizon.ai