On 01-08-19, 09:47, Rafael J. Wysocki wrote:
On Thu, Aug 1, 2019 at 8:17 AM Viresh Kumar viresh.kumar@linaro.org wrote:
On 31-07-19, 17:20, Doug Smythies wrote:
Hi Viresh,
Summary:
The old way, using UINT_MAX had two purposes: first, as a "need to do a frequency update" flag; but also second, to force any subsequent old/new frequency comparison to NOT be "the same, so why bother actually updating" (see: sugov_update_next_freq). All patches so far have been dealing with the flag, but only partially the comparisons. In a busy system, and when schedutil.c doesn't actually know the currently set system limits, the new frequency is dominated by values the same as the old frequency. So, when sugov_fast_switch calls sugov_update_next_freq, false is usually returned.
And finally we know "Why" :)
Good work Doug. Thanks for taking it to the end.
However, if we move the resetting of the flag and add another condition to the "no need to actually update" decision, then perhaps this patch version 1 will be O.K. It seems to be. (see way later in this e-mail).
With all this new knowledge, how about going back to version 1 of this patch, and then adding this:
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 808d32b..f9156db 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -100,7 +100,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time, unsigned int next_freq) {
if (sg_policy->next_freq == next_freq)
/*
* Always force an update if the flag is set, regardless.
* In some implementations (intel_cpufreq) the frequency is clamped
* further downstream, and might not actually be different here.
*/
if (sg_policy->next_freq == next_freq && !sg_policy->need_freq_update) return false;
This is not correct because this is an optimization we have in place to make things more efficient. And it was working by luck earlier and my patch broke it for good :)
OK, so since we know why it was wrong now, why don't we just revert it? Plus maybe add some comment explaining the rationale in there?
Because the patch [1] which caused these issues was almost correct, just that it missed the busy accounting for single CPU case.
The main idea behind the original patch [1] was to avoid any unwanted/hidden side-affects by overriding the value of next_freq. What we see above is exactly the case for that. Because we override the value of next_freq, we made intel-pstate work by chance, unintentionally. Which is wrong. And who knows what other side affects it had, we already found two (this one and the one fixed by [1]).
I would strongly suggest that we don't override the value of next_freq with special meaning, as it is used at so many places we don't know what it may result in.