On Wed, Nov 20, 2024 at 10:03:54AM +0100, Peter Zijlstra wrote:
On Tue, Nov 19, 2024 at 04:30:02PM -0800, Chenbo Lu wrote:
Hello,
I am experiencing a significant performance degradation after upgrading my kernel from version 6.6 to 6.8 and would appreciate any insights or suggestions.
I am running a high-load simulation system that spawns more than 1000 threads and the overall CPU usage is 30%+ . Most of the threads are using real-time scheduling (SCHED_RR), and the threads of a model are using SCHED_DEADLINE. After upgrading the kernel, I noticed that the execution time of my model has increased from 4.5ms to 6ms.
What I Have Done So Far:
- I found this [bug
report](https://bugzilla.kernel.org/show_bug.cgi?id=219366#c7) and reverted the commit efa7df3e3bb5da8e6abbe37727417f32a37fba47 mentioned in the post. Unfortunately, this did not resolve the issue. 2. I performed a git bisect and found that after these two commits related to scheduling (RT and deadline) were merged, the problem happened. They are 612f769edd06a6e42f7cd72425488e68ddaeef0a, 5fe7765997b139e2d922b58359dea181efe618f9
And yet you failed to Cc Valentin, the author of said commits :/
After reverting these two commits, the model execution time improved to around 5 ms. 3. I revert two more commits, and the execution time is back to 4.7ms: 63ba8422f876e32ee564ea95da9a7313b13ff0a1, efa7df3e3bb5da8e6abbe37727417f32a37fba47
My questions are: 1.Has anyone else experienced similar performance degradation after upgrading to kernel 6.8?
This is 4 kernel releases back, I my memory isn't that long.
2.Can anyone explain why these two commits are causing the problem? I am not very familiar with the kernel code and would appreciate any insights.
There might be a race window between setting the tro and sending the IPI, such that previously the extra IPIs would sooner find the newly pushable task.
Valentin, would it make sense to set tro before enqueueing the pushable, instead of after it?
s/tro/rto/ clearly I'm consistently not capable of typing that :-)