On Tue, Oct 14, 2025 at 7:20 PM Christian Loehle christian.loehle@arm.com wrote:
On 10/14/25 16:54, Rafael J. Wysocki wrote:
On Tue, Oct 14, 2025 at 5:11 PM Christian Loehle christian.loehle@arm.com wrote:
On 10/14/25 12:55, Sergey Senozhatsky wrote:
On (25/10/14 11:25), Christian Loehle wrote:
On 10/14/25 11:23, Sergey Senozhatsky wrote:
On (25/10/14 10:50), Christian Loehle wrote: >> Upstream fixup fa3fa55de0d ("cpuidle: governors: menu: Avoid using >> invalid recent intervals data") doesn't address the problems we are >> observing. Revert seems to be bringing performance metrics back to >> pre-regression levels. > > Any details would be much appreciated. > How do the idle state usages differ with and without > "cpuidle: menu: Avoid discarding useful information"? > What do the idle states look like in your platform?
Sure, I can run tests. How do I get the numbers/stats that you are asking for?
Ideally just dump cat /sys/devices/system/cpu/cpu*/cpuidle/state*/* before and after the test.
OK, got some data for you. The terminology being used here is as follows:
6.1-base is 6.1 stable with a9edb700846 "cpuidle: menu: Avoid discarding useful information"
6.1-base-fixup is 6.1 stable with a9edb700846 and fa3fa55de0d6 "cpuidle: governors: menu: Avoid using invalid recent intervals data" cherry-pick
6.1-revert is 6.1 stable with a9edb700846 reverted (and no fixup commit, obviously)
Just to show the scale of regression, results of some of the benchmarks:
6.1-base: 84.5 6.1-base-fixup: 76.5 6.1-revert: 59.5
(lower is better, 6.1-revert has the same results as previous stable kernels).
This immediately threw me off. The fixup was written for a specific system which had completely broken cpuidle. It shouldn't affect any sane system significantly. I double checked the numbers and your system looks fine, in fact none of the tests had any rejected cpuidle occurrences. So functionally base and base-fixup are identical for you. The cpuidle numbers are also reasonably 'in the noise', so just for the future some stats would be helpful on those scores.
I can see a huge difference between base and revert in terms of cpuidle, so that's enough for me to take a look, I'll do that now. (6.1-revert has more C3_ACPI in favor of C1_ACPI.)
(Also I can't send this email without at least recommending teo instead of menu for your platform / use-cases, if you deemed it unfit I'd love to know what didn't work for you!)
Well, yeah.
So I've already done some analysis.
There are 4 C-states, POLL, C1, C6 and C10 (at least that's what the MWAIT hints tell me).
This is how many times each of them was requested during the workload run on base 6.1.y:
POLL: 21445 C1: 2993722 C6: 767029 C10: 736854
and in percentage of the total idle state requests:
POLL: 0,47% C1: 66,25% C6: 16,97% C10: 16,31%
With the problematic commit reverted, this became
POLL: 16092 C1: 2452591 C6: 750933 C10: 1150259
and (again) in percentage of the total:
POLL: 0,37% C1: 56,12% C6: 17,18% C10: 26,32%
Overall, POLL is negligible and the revet had no effect on the number of times C6 was requested. The difference is for C1 and C10 and it's 10% in both cases, but going in opposite directions so to speak: C1 was requested 10% less and C10 was requested 10% more after the revert.
Let's see how this corresponds to the residency numbers.
For base 6.1.y there was
POLL: 599883 C1: 732303748 C6: 576785253 C10: 2020491489
and in percentage of the total
POLL: 0,02% C1: 21,99% C6: 17,32% C10: 60,67%
After the revert it became
POLL: 469451 C1: 517623465 C6: 508945687 C10: 2567701673
and in percentage of the total
POLL: 0,01% C1: 14,40% C6: 14,16% C10: 71,43%
so with the revert the CPUs spend around 7% more time in deep idle states (C6 and C10 combined).
I have to say that this is consistent with the intent of the problematic commit, which is to reduce the number of times the deepest idle state is requested although it is likely to be too deep.
However, on the system in question this somehow causes performance to drop significantly (even though shallow idle states are used more often which should result in lower average idle state exit latency and better performance).
One possible explanation is that this somehow affects turbo frequencies. That is, requesting shallower idle states on idle CPUs prevents the other CPUs from getting sufficiently high turbo.
Sergey, can you please run the workload under turbostat on the base 6.1.y and on 6.1.y with the problematic commit reverted and send the turbostat output from both runs (note: turbostat needs to be run as root)?
That's the most plausible explanation and would also be my guess. FWIW most of the C3_ACPI (== C10) with revert are objectively wrong with 78% idle misses (they were already pretty high with base around 72.5%).
Yeah, so the commit in question works as intended and from a purely cpuidle standpoint it is an improvement.
However, for the system/workload combination in question, cpuidle really is used as a means to crank up single-thread CPU performance.
Honestly, I'm not really sure what to do about this. It looks like this workload might benefit from selecting C6 more often, but I'm not quite sure how to make that happen in a generic way. Also, it may regress other workloads.
I'll leave this here for easier following:
Thanks!
===== 6.1-base: after minus before deltas (aggregated across CPUs) ===== +---------+-------------+------------+--------------+---------------+------------+------------+---------+ | state | time_diff_s | usage_diff | avg_resid_us | rejected_diff | above_diff | below_diff | share_% | +---------+-------------+------------+--------------+---------------+------------+------------+---------+ | POLL | 0.600 | 21,445 | 28.0 | 0 | 0 | 19,846 | 0.02 | | C1_ACPI | 732.304 | 2,993,722 | 244.6 | 0 | 3,816 | 280,613 | 21.99 | | C2_ACPI | 576.785 | 767,029 | 752.0 | 0 | 272,105 | 453 | 17.32 | | C3_ACPI | 2,020.491 | 736,854 | 2,742.1 | 0 | 534,424 | 0 | 60.67 | | TOTAL | 3,330.180 | 4,519,050 | | 0 | 810,345 | 300,912 | 100.00 | +---------+-------------+------------+--------------+---------------+------------+------------+---------+
===== 6.1-revert: after minus before deltas (aggregated across CPUs) ===== +---------+-------------+------------+--------------+---------------+------------+------------+---------+ | state | time_diff_s | usage_diff | avg_resid_us | rejected_diff | above_diff | below_diff | share_% | +---------+-------------+------------+--------------+---------------+------------+------------+---------+ | POLL | 0.469 | 16,092 | 29.2 | 0 | 0 | 14,855 | 0.01 | | C1_ACPI | 517.623 | 2,452,591 | 211.1 | 0 | 4,109 | 150,500 | 14.40 | | C2_ACPI | 508.946 | 750,933 | 677.8 | 0 | 327,457 | 427 | 14.16 | | C3_ACPI | 2,567.702 | 1,150,259 | 2,232.3 | 0 | 895,311 | 0 | 71.43 | | TOTAL | 3,594.740 | 4,369,875 | | 0 | 1,226,877 | 165,782 | 100.00 | +---------+-------------+------------+--------------+---------------+------------+------------+---------+
===== 6.1-revert minus 6.1-base (state-by-state deltas of the deltas) ===== +---------+-----------+----------+----------+---------------+----------+----------+ | state | Δshare_pp | Δusage | Δtime_s | Δavg_resid_us | Δabove | Δbelow | +---------+-----------+----------+----------+---------------+----------+----------+ | POLL | -0.00 | -5,353 | -0.130 | 1.2 | +0 | -4,991 | | C1_ACPI | -7.59 | -541,131 | -214.680 | -33.6 | +293 | -130,113 | | C2_ACPI | -3.16 | -16,096 | -67.840 | -74.2 | +55,352 | -26 | | C3_ACPI | +10.76 | +413,405 | 547.210 | -509.8 | +360,887 | +0 | | TOTAL | +0.00 | -149,175 | 264.560 | | +416,532 | -135,130 | +---------+-----------+----------+----------+---------------+----------+----------+