On 17 March 2014 21:08, Lukasz Majewski <l.majewski(a)samsung.com> wrote:
>> Despite this patch set is working and applicable on top of 3.14-rc5,
>> please regard it solely as a pure RFC.
>>
>> This patch provides support for LAB governor build on top of ondemand.
>> Previous version of LAB can be found here:
>> http://thread.gmane.org/gmane.linux.kernel/1484746/match=cpufreq
>>
>> LAB short reminder:
>>
>> LAB uses information about how many cores are in "idle" state (the
>> core idleness is represented as the value between 0 and 100) and the
>> overall load of the system (from 0 to 100) to decide about frequency
>> to be set. It is extremely useful with SoCs like Exynos4412, which
>> can set only one frequency for all cores.
>>
>> Important design decisions:
>>
>> - Reuse well established ondemand governor's internal code. To do this
>> I had to expose some previously static internal ondemand code.
>> This allowed smaller LAB code when compared to previous version.
>>
>> - LAB works on top of ondemand, which means that one via device tree
>> attributes can specify if and when e.g. BOOST shall be enabled or
>> if any particular frequency shall be imposed. For situation NOT
>> important from the power consumption reduction viewpoint the ondemand
>> is used to set proper frequency.
>>
>> - It is only possible to either compile in or not the LAB into the
>> kernel. There is no "M" option for Kconfig. It is done on purpose,
>> since ondemand itself can be also compiled as a module and then it
>> would be possible to remove ondemand when LAB is working on top of it.
>>
>> - The LAB operation is specified (and thereof extendable) via device
>> tree lab-ctrl-freq attribute defined at /cpus/cpu0.
>>
>>
>> Problems:
>> - How the governor will work for big.LITTLE systems (especially
>> Global Task Scheduling).
>> - Will there be agreement to expose internal ondemand code to be
>> reused for more specialized governors.
>>
>> Test HW:
>> Exynos4412 - Trats2 board.
>> Above patches were posted on top of Linux 3.14-rc5
>> (SHA1: 3f9590c281c66162bf8ae9b7b2d987f0a89043c6)
>>
>
> Any comments about those patches?
Sorry for being late on reviewing these..
I tried to go through the patches but didn't looked at the minutest
of the details. Its been a long time when you first sent this patchset.
And the memories have corrupted by now :) ..
To get context back, can we discuss again the fundamentals behind
this new governor you are proposing. And then we can discuss about
it again, its pros/cons, etc..
I tried to go to earlier threads but I think we better do it again..
People are reluctant in getting another governor in and want to give
existing governors a try if possible.
So, please explain the basics behind your governor again and then
we can put our arguments again..
--
viresh
Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE
notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers
should strictly alternate, thereby preventing two different sets of PRECHANGE or
POSTCHANGE notifiers from interleaving arbitrarily.
The following examples illustrate why this is important:
Scenario 1:
-----------
A thread reading the value of cpuinfo_cur_freq, will call
__cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()
The ondemand governor can decide to change the frequency of the CPU at the same
time and hence it can end up sending the notifications via ->target().
If the notifiers are not serialized, the following sequence can occur:
- PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
- PRECHANGE Notification for freq B (from target())
- Freq changed by target() to B
- POSTCHANGE Notification for freq B
- POSTCHANGE Notification for freq A
We can see from the above that the last POSTCHANGE Notification happens for freq
A but the hardware is set to run at freq B.
Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback()
in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the
loops_per_jiffy calculations will get messed up.
Scenario 2:
-----------
The governor calls __cpufreq_driver_target() to change the frequency. At the
same time, if we change scaling_{min|max}_freq from sysfs, it will end up
calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call
__cpufreq_driver_target(). And hence we end up issuing concurrent calls to
->target().
Typically, platforms have the following logic in their ->target() routines:
(Eg: cpufreq-cpu0, omap, exynos, etc)
A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage
Now, if the two concurrent calls to ->target() are X and Y, where X is trying to
increase the freq and Y is trying to decrease it, we get the following race
condition:
X.A: voltage gets increased for larger freq
Y.A: nothing happens
Y.B: freq gets decreased
Y.C: voltage gets decreased
X.B: freq gets increased
X.C: nothing happens
Thus we can end up setting a freq which is not supported by the voltage we have
set. That will probably make the clock to the CPU unstable and the system might
not work properly anymore.
This patchset introduces a new set of routines cpufreq_freq_transition_begin()
and cpufreq_freq_transition_end(), which will guarantee that calls to frequency
transition routines are serialized. Later patches force other drivers to use
these new routines.
Srivatsa S. Bhat (1):
cpufreq: Make sure frequency transitions are serialized
Viresh Kumar (2):
cpufreq: Convert existing drivers to use
cpufreq_freq_transition_{begin|end}
cpufreq: Make cpufreq_notify_transition &
cpufreq_notify_post_transition static
drivers/cpufreq/cpufreq-nforce2.c | 4 +--
drivers/cpufreq/cpufreq.c | 52 +++++++++++++++++++++++++++++-------
drivers/cpufreq/exynos5440-cpufreq.c | 4 +--
drivers/cpufreq/gx-suspmod.c | 4 +--
drivers/cpufreq/integrator-cpufreq.c | 4 +--
drivers/cpufreq/longhaul.c | 4 +--
drivers/cpufreq/pcc-cpufreq.c | 4 +--
drivers/cpufreq/powernow-k6.c | 4 +--
drivers/cpufreq/powernow-k7.c | 4 +--
drivers/cpufreq/powernow-k8.c | 4 +--
drivers/cpufreq/s3c24xx-cpufreq.c | 4 +--
drivers/cpufreq/sh-cpufreq.c | 4 +--
drivers/cpufreq/unicore2-cpufreq.c | 4 +--
include/linux/cpufreq.h | 12 ++++++---
14 files changed, 76 insertions(+), 36 deletions(-)
--
1.7.12.rc2.18.g61b472e
From: Mark Brown <broonie(a)linaro.org>
Add support for parsing the explicit topology bindings to discover the
topology of the system.
Since it is not currently clear how to map multi-level clusters for the
scheduler all leaf clusters are presented to the scheduler at the same
level. This should be enough to provide good support for current systems.
Signed-off-by: Mark Brown <broonie(a)linaro.org>
---
Stylistic updates requested by Lorenzo and a fix for a missing error
check in DT parsing.
arch/arm64/kernel/topology.c | 172 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 166 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 3e06b0be4ec8..548f04572e26 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -17,10 +17,163 @@
#include <linux/percpu.h>
#include <linux/node.h>
#include <linux/nodemask.h>
+#include <linux/of.h>
#include <linux/sched.h>
#include <asm/topology.h>
+#ifdef CONFIG_OF
+static int __init get_cpu_for_node(struct device_node *node)
+{
+ struct device_node *cpu_node;
+ int cpu;
+
+ cpu_node = of_parse_phandle(node, "cpu", 0);
+ if (!cpu_node)
+ return -1;
+
+ for_each_possible_cpu(cpu) {
+ if (of_get_cpu_node(cpu, NULL) == cpu_node)
+ return cpu;
+ }
+
+ pr_crit("Unable to find CPU node for %s\n", cpu_node->full_name);
+ return -1;
+}
+
+static int __init parse_core(struct device_node *core, int cluster_id,
+ int core_id)
+{
+ char name[10];
+ bool leaf = true;
+ int i = 0;
+ int cpu;
+ struct device_node *t;
+
+ do {
+ snprintf(name, sizeof(name), "thread%d", i);
+ t = of_get_child_by_name(core, name);
+ if (t) {
+ leaf = false;
+ cpu = get_cpu_for_node(t);
+ if (cpu >= 0) {
+ cpu_topology[cpu].cluster_id = cluster_id;
+ cpu_topology[cpu].core_id = core_id;
+ cpu_topology[cpu].thread_id = i;
+ } else {
+ pr_err("%s: Can't get CPU for thread\n",
+ t->full_name);
+ return -EINVAL;
+ }
+ }
+ i++;
+ } while (t);
+
+ cpu = get_cpu_for_node(core);
+ if (cpu >= 0) {
+ if (!leaf) {
+ pr_err("%s: Core has both threads and CPU\n",
+ core->full_name);
+ return -EINVAL;
+ }
+
+ cpu_topology[cpu].cluster_id = cluster_id;
+ cpu_topology[cpu].core_id = core_id;
+ } else if (leaf) {
+ pr_err("%s: Can't get CPU for leaf core\n", core->full_name);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int __init parse_cluster(struct device_node *cluster, int depth)
+{
+ char name[10];
+ bool leaf = true;
+ bool has_cores = false;
+ struct device_node *c;
+ static int __initdata cluster_id;
+ int core_id = 0;
+ int i, ret;
+
+ /*
+ * First check for child clusters; we currently ignore any
+ * information about the nesting of clusters and present the
+ * scheduler with a flat list of them.
+ */
+ i = 0;
+ do {
+ snprintf(name, sizeof(name), "cluster%d", i);
+ c = of_get_child_by_name(cluster, name);
+ if (c) {
+ ret = parse_cluster(c, depth + 1);
+ if (ret != 0)
+ return ret;
+ leaf = false;
+ }
+ i++;
+ } while (c);
+
+ /* Now check for cores */
+ i = 0;
+ do {
+ snprintf(name, sizeof(name), "core%d", i);
+ c = of_get_child_by_name(cluster, name);
+ if (c) {
+ has_cores = true;
+
+ if (depth == 0)
+ pr_err("%s: cpu-map children should be clusters\n",
+ c->full_name);
+
+ if (leaf) {
+ ret = parse_core(c, cluster_id, core_id++);
+ if (ret != 0) {
+ return ret;
+ }
+ } else {
+ pr_err("%s: Non-leaf cluster with core %s\n",
+ cluster->full_name, name);
+ return -EINVAL;
+ }
+ }
+ i++;
+ } while (c);
+
+ if (leaf && !has_cores)
+ pr_warn("%s: empty cluster\n", cluster->full_name);
+
+ if (leaf)
+ cluster_id++;
+
+ return 0;
+}
+
+static int __init parse_dt_topology(void)
+{
+ struct device_node *cn;
+
+ cn = of_find_node_by_path("/cpus");
+ if (!cn) {
+ pr_err("No CPU information found in DT\n");
+ return 0;
+ }
+
+ /*
+ * When topology is provided cpu-map is essentially a root
+ * cluster with restricted subnodes.
+ */
+ cn = of_get_child_by_name(cn, "cpu-map");
+ if (!cn)
+ return 0;
+ return parse_cluster(cn, 0);
+}
+
+#else
+static inline int parse_dt_topology(void) { return 0; }
+#endif
+
/*
* cpu topology table
*/
@@ -74,15 +227,10 @@ void store_cpu_topology(unsigned int cpuid)
update_siblings_masks(cpuid);
}
-/*
- * init_cpu_topology is called at boot when only one cpu is running
- * which prevent simultaneous write access to cpu_topology array
- */
-void __init init_cpu_topology(void)
+static void __init reset_cpu_topology(void)
{
unsigned int cpu;
- /* init core mask and power*/
for_each_possible_cpu(cpu) {
struct cpu_topology *cpu_topo = &cpu_topology[cpu];
@@ -93,3 +241,15 @@ void __init init_cpu_topology(void)
cpumask_clear(&cpu_topo->thread_sibling);
}
}
+
+void __init init_cpu_topology(void)
+{
+ reset_cpu_topology();
+
+ /*
+ * Discard anything that was parsed if we hit an error so we
+ * don't use partial information.
+ */
+ if (parse_dt_topology())
+ reset_cpu_topology();
+}
--
1.9.0