Damn, I wrote a reply on the 20th but seems like I never sent it. At least I found it saved somewhere, so I don't have to rewrite it...
On 20/11/24 10:03, Peter Zijlstra wrote:
On Tue, Nov 19, 2024 at 04:30:02PM -0800, Chenbo Lu wrote:
Hello,
I am experiencing a significant performance degradation after upgrading my kernel from version 6.6 to 6.8 and would appreciate any insights or suggestions.
I am running a high-load simulation system that spawns more than 1000 threads and the overall CPU usage is 30%+ . Most of the threads are using real-time scheduling (SCHED_RR), and the threads of a model are using SCHED_DEADLINE. After upgrading the kernel, I noticed that the execution time of my model has increased from 4.5ms to 6ms.
What I Have Done So Far:
- I found this [bug
report](https://bugzilla.kernel.org/show_bug.cgi?id=219366#c7) and reverted the commit efa7df3e3bb5da8e6abbe37727417f32a37fba47 mentioned in the post. Unfortunately, this did not resolve the issue. 2. I performed a git bisect and found that after these two commits related to scheduling (RT and deadline) were merged, the problem happened. They are 612f769edd06a6e42f7cd72425488e68ddaeef0a, 5fe7765997b139e2d922b58359dea181efe618f9
And yet you failed to Cc Valentin, the author of said commits :/
After reverting these two commits, the model execution time improved to around 5 ms. 3. I revert two more commits, and the execution time is back to 4.7ms: 63ba8422f876e32ee564ea95da9a7313b13ff0a1, efa7df3e3bb5da8e6abbe37727417f32a37fba47
My questions are: 1.Has anyone else experienced similar performance degradation after upgrading to kernel 6.8?
This is 4 kernel releases back, I my memory isn't that long.
2.Can anyone explain why these two commits are causing the problem? I am not very familiar with the kernel code and would appreciate any insights.
There might be a race window between setting the tro and sending the IPI, such that previously the extra IPIs would sooner find the newly pushable task.
Valentin, would it make sense to set tro before enqueueing the pushable, instead of after it?
Urgh, those cachelines are beyond cold...
/me goes reading
Ok yeah I guess we could have this race vs
rto_push_irq_work_func() `\ rto_next_cpu()
Not sure if that applies to DL too since it doesn't have the PUSH_IPI thing, but anyway - Chenbo, could you please try the below? --- diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index d9d5a702f1a61..270a25335c4bc 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -602,16 +602,16 @@ static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)
WARN_ON_ONCE(!RB_EMPTY_NODE(&p->pushable_dl_tasks));
+ if (!rq->dl.overloaded) { + dl_set_overload(rq); + rq->dl.overloaded = 1; + } + leftmost = rb_add_cached(&p->pushable_dl_tasks, &rq->dl.pushable_dl_tasks_root, __pushable_less); if (leftmost) rq->dl.earliest_dl.next = p->dl.deadline; - - if (!rq->dl.overloaded) { - dl_set_overload(rq); - rq->dl.overloaded = 1; - } }
static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index bd66a46b06aca..1ea45a45ed657 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -385,6 +385,15 @@ static inline void rt_queue_pull_task(struct rq *rq)
static void enqueue_pushable_task(struct rq *rq, struct task_struct *p) { + /* + * Set RTO first so rto_push_irq_work_func() can see @rq as a push + * candidate as early as possible. + */ + if (!rq->rt.overloaded) { + rt_set_overload(rq); + rq->rt.overloaded = 1; + } + plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks); plist_node_init(&p->pushable_tasks, p->prio); plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks); @@ -392,15 +401,15 @@ static void enqueue_pushable_task(struct rq *rq, struct task_struct *p) /* Update the highest prio pushable task */ if (p->prio < rq->rt.highest_prio.next) rq->rt.highest_prio.next = p->prio; - - if (!rq->rt.overloaded) { - rt_set_overload(rq); - rq->rt.overloaded = 1; - } }
static void dequeue_pushable_task(struct rq *rq, struct task_struct *p) { + /* + * To match enqueue we should check/unset RTO first, but for that we + * need to pop @p first. This makes this asymmetric wrt enqueue, but + * the worst we can get out of this is an extra useless IPI. + */ plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
/* Update the new highest prio pushable task */