On Mon, Jul 29, 2019 at 10:32 AM Viresh Kumar viresh.kumar@linaro.org wrote:
On 29-07-19, 00:55, Doug Smythies wrote:
On 2019.07.25 23:58 Viresh Kumar wrote:
Hmm, so I tried to reproduce your setup on my ARM board.
- booted only with CPU0 so I hit the sugov_update_single() routine
- And applied below diff to make CPU look permanently busy:
-------------------------8<------------------------- diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 2f382b0959e5..afb47490e5dc 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -121,6 +121,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time, if (!sugov_update_next_freq(sg_policy, time, next_freq)) return;
pr_info("%s: %d: %u\n", __func__, __LINE__, freq);
?? there is no "freq" variable here, and so this doesn't compile. However this works:
pr_info("%s: %d: %u\n", __func__, __LINE__, next_freq);
There are two paths we can take to change the frequency, normal sleep-able path (sugov_work) or fast path. Only one of them is taken by any driver ever. In your case it is the fast path always and in mine it was the slow path.
I only tested the diff with slow-path and copy pasted to fast path while giving out to you and so the build issue. Sorry about that.
Also make sure that the print is added after sugov_update_next_freq() is called, not before it.
next_freq = cpufreq_driver_fast_switch(policy, next_freq); if (!next_freq) return;
@@ -424,14 +425,10 @@ static unsigned long sugov_iowait_apply(struct sugov_cpu *sg_cpu, u64 time, #ifdef CONFIG_NO_HZ_COMMON static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) {
unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
bool ret = idle_calls == sg_cpu->saved_idle_calls;
sg_cpu->saved_idle_calls = idle_calls;
return ret;
return true;
} #else -static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; } +static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return true; } #endif /* CONFIG_NO_HZ_COMMON */
/* @@ -565,6 +562,7 @@ static void sugov_work(struct kthread_work *work) sg_policy->work_in_progress = false; raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
pr_info("%s: %d: %u\n", __func__, __LINE__, freq); mutex_lock(&sg_policy->work_lock); __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); mutex_unlock(&sg_policy->work_lock);
-------------------------8<-------------------------
Now, the frequency never gets down and so gets set to the maximum possible after a bit.
- Then I did:
echo <any-low-freq-value> > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
Without my patch applied: The print never gets printed and so frequency doesn't go down.
With my patch applied: The print gets printed immediately from sugov_work() and so the frequency reduces.
Can you try with this diff along with my Patch2 ? I suspect there may be something wrong with the intel_cpufreq driver as the patch fixes the only path we have in the schedutil governor which takes busyness of a CPU into account.
With this diff along with your patch2 There is never a print message from sugov_work. There are from sugov_fast_switch.
Which is okay. sugov_work won't get hit in your case as I explained above.
Note that for the intel_cpufreq CPU scaling driver and the schedutil governor I adjust the maximum clock frequency this way:
echo <any-low-percent> > /sys/devices/system/cpu/intel_pstate/max_perf_pct
This should eventually call sugov_limits() in schedutil governor, this can be easily checked with another print message.
I also applied the pr_info messages to the reverted kernel, and re-did my tests (where everything works as expected). There is never a print message from sugov_work. There are from sugov_fast_switch.
that's fine.
Notes:
I do not know if: /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq Need to be accurate when using the intel_pstate driver in passive mode. They are not. The commit comment for 9083e4986124389e2a7c0ffca95630a4983887f0 suggests that they might need to be representative. I wonder if something similar to that commit is needed for other global changes, such as max_perf_pct and min_perf_pct?
We are already calling intel_pstate_update_policies() in that case, so it should be fine I believe.
intel_cpufreq/ondemand doesn't work properly on the reverted kernel.
reverted kernel ? The patch you reverted was only for schedutil and it shouldn't have anything to do with ondemand.
(just discovered, not investigated) I don't know about other governors.
When you do:
echo <any-low-percent> > /sys/devices/system/cpu/intel_pstate/max_perf_pct
How soon does the print from sugov_fast_switch() gets printed ? Immediately ? Check with both the kernels, with my patch and with the reverted patch.
Also see if there is any difference in the next_freq value in both the kernels when you change max_perf_pct.
FWIW, we now know the difference between intel-pstate and acpi-cpufreq/my testcase and why we see differences here. In the cases where my patch fixed the issue (acpi/ARM), we were really changing the limits, i.e. policy->min/max. This happened because we touched scaling_max_freq directly.
For the case of intel-pstate, you are changing max_perf_pct which doesn't change policy->max directly. I am not very sure how all of it work really, but at least schedutil will not see policy->max changing.
@Rafael: Do you understand why things don't work properly with intel_cpufreq driver ?
I haven't tried to understand this yet, so no.
My somewhat educated guess is that using max_perf_pct has to do with it, so I would try to retest to see if there's any difference when scaling_max_freq is used instead of that.