From: Mark Brown <broonie(a)linaro.org>
As a legacy of the way 32 bit ARM did things the topology code uses a null
topology map by default and then overwrites it by mapping cores with no
information to a cluster by themselves later. In order to make it simpler
to reset things as part of recovering from parse failures in firmware
information directly set this configuration on init. A core will always be
its own sibling so there should be no risk of confusion with firmware
provided information.
Signed-off-by: Mark Brown <broonie(a)linaro.org>
---
arch/arm64/kernel/topology.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 3e06b0be4ec8..ff662b23af5f 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -43,9 +43,6 @@ static void update_siblings_masks(unsigned int cpuid)
* reset it to default behaviour
*/
pr_debug("CPU%u: No topology information configured\n", cpuid);
- cpuid_topo->core_id = 0;
- cpumask_set_cpu(cpuid, &cpuid_topo->core_sibling);
- cpumask_set_cpu(cpuid, &cpuid_topo->thread_sibling);
return;
}
@@ -87,9 +84,12 @@ void __init init_cpu_topology(void)
struct cpu_topology *cpu_topo = &cpu_topology[cpu];
cpu_topo->thread_id = -1;
- cpu_topo->core_id = -1;
+ cpu_topo->core_id = 0;
cpu_topo->cluster_id = -1;
+
cpumask_clear(&cpu_topo->core_sibling);
+ cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
cpumask_clear(&cpu_topo->thread_sibling);
+ cpumask_set_cpu(cpu, &cpu_topo->thread_sibling);
}
}
--
1.9.1
This patchset was previously part of the larger tasks packing patchset [1].
I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology (this patchset)
-update and consolidation of cpu_power
-tasks packing algorithm
Based on Peter Z's proposal [2][3], this patchset modifies the way to configure
the sched_domain level in order to let architectures to add specific level like
the current BOOK level or the proposed power gating level for ARM architecture.
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2013/11/5/239
[3] https://lkml.org/lkml/2013/11/5/449
Change since v2:
- remove patch 1 as it has been already applied to metag tree for v3.15
- updates some commit messages
- add new flag description in TOPOLOGY_SD_FLAGS description
Change since v1:
- move sched_domains_curr_level back under #ifdef CONFIG_NUMA
- use function pointer to set flag instead of a plain value.
- add list of tunable flags in the commit message of patch 2
- add SD_SHARE_POWER_DOMAIN flag for powerpc's SMT level
Vincent Guittot (6):
sched: rework of sched_domain topology definition
sched: s390: create a dedicated topology table
sched: powerpc: create a dedicated topology table
sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
sched: ARM: create a dedicated scheduler topology table
sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
arch/arm/kernel/topology.c | 26 +++++
arch/ia64/include/asm/topology.h | 24 ----
arch/powerpc/kernel/smp.c | 31 +++--
arch/s390/include/asm/topology.h | 13 +--
arch/s390/kernel/topology.c | 20 ++++
arch/tile/include/asm/topology.h | 33 ------
include/linux/sched.h | 49 +++++++-
include/linux/topology.h | 128 +++-----------------
kernel/sched/core.c | 244 ++++++++++++++++++++-------------------
9 files changed, 255 insertions(+), 313 deletions(-)
--
1.9.0
[Adding Linaro lists in cc as there are few people here working on power/thermal
stuff.]
On 24 March 2014 15:30, Lukasz Majewski <l.majewski(a)samsung.com> wrote:
>> On 4 March 2014 15:57, Lukasz Majewski <l.majewski(a)samsung.com> wrote:
> I think, that "LAB" name is with us for some time, so it would be a
> pity to discard it.
It doesn't matter with Mainline how you do naming initially for your code :)
We need to pick the right name now, and the decision should be made
now (after discussions obviously) :)
>> What about making it as simple as:
>> - changing the ondemand governor only instead of adding a new governor
>
> My goal is to not touch the ondemand code. It has matured, so I would
> like to leave it as it is.
Because the boost feature is already part of CPUFreq core, I think its
better if we enhance current governors to use it. So, I would like to
make this part of existing governors. Not only ondemand but maybe
conservative as well..
Also, I feel we maynot necessarily move this piece of code into cpufreq.
All you are doing is thermal management here :)
If we are sure we will not burn out our SoC (When many cores are idle),
run at max freq (if there is enough load of course :))..
And if there are chances that we might burn our chip (when very few
cores are idle), don't run on boost frequencies..
This is actually a 'cooling' device :)
Think of it this way: CPUFreq will provide a range of frequency which
SoC's can use. And then based on some conditions we may or may not
want to run on these frequencies.
@Zhang/Eduardo: Can we have your inputs here as well ?
This may look hard but we need to design things in the best possible
way for managing things better in future. Lets see what others have
to say on this.
Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE
notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers
should strictly alternate, thereby preventing two different sets of PRECHANGE or
POSTCHANGE notifiers from interleaving arbitrarily.
The following examples illustrate why this is important:
Scenario 1:
-----------
A thread reading the value of cpuinfo_cur_freq, will call
__cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()
The ondemand governor can decide to change the frequency of the CPU at the same
time and hence it can end up sending the notifications via ->target().
If the notifiers are not serialized, the following sequence can occur:
- PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
- PRECHANGE Notification for freq B (from target())
- Freq changed by target() to B
- POSTCHANGE Notification for freq B
- POSTCHANGE Notification for freq A
We can see from the above that the last POSTCHANGE Notification happens for freq
A but the hardware is set to run at freq B.
Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback()
in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the
loops_per_jiffy calculations will get messed up.
Scenario 2:
-----------
The governor calls __cpufreq_driver_target() to change the frequency. At the
same time, if we change scaling_{min|max}_freq from sysfs, it will end up
calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call
__cpufreq_driver_target(). And hence we end up issuing concurrent calls to
->target().
Typically, platforms have the following logic in their ->target() routines:
(Eg: cpufreq-cpu0, omap, exynos, etc)
A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage
Now, if the two concurrent calls to ->target() are X and Y, where X is trying to
increase the freq and Y is trying to decrease it, we get the following race
condition:
X.A: voltage gets increased for larger freq
Y.A: nothing happens
Y.B: freq gets decreased
Y.C: voltage gets decreased
X.B: freq gets increased
X.C: nothing happens
Thus we can end up setting a freq which is not supported by the voltage we have
set. That will probably make the clock to the CPU unstable and the system might
not work properly anymore.
This patchset introduces a new set of routines cpufreq_freq_transition_begin()
and cpufreq_freq_transition_end(), which will guarantee that calls to frequency
transition routines are serialized. Later patches force other drivers to use
these new routines.
V4: https://lkml.org/lkml/2014/3/21/23
V4->V5:
- Replaced false with 0 as the variable was of int type instead of bool.
- There were some discussions about requirement of a barrier, but it looks like
overkill for now. So, leaving that unless we have a real problem.
Srivatsa S. Bhat (1):
cpufreq: Make sure frequency transitions are serialized
Viresh Kumar (2):
cpufreq: Convert existing drivers to use
cpufreq_freq_transition_{begin|end}
cpufreq: Make cpufreq_notify_transition &
cpufreq_notify_post_transition static
drivers/cpufreq/cpufreq-nforce2.c | 4 +--
drivers/cpufreq/cpufreq.c | 52 +++++++++++++++++++++++++++++-------
drivers/cpufreq/exynos5440-cpufreq.c | 4 +--
drivers/cpufreq/gx-suspmod.c | 4 +--
drivers/cpufreq/integrator-cpufreq.c | 4 +--
drivers/cpufreq/longhaul.c | 4 +--
drivers/cpufreq/pcc-cpufreq.c | 4 +--
drivers/cpufreq/powernow-k6.c | 4 +--
drivers/cpufreq/powernow-k7.c | 4 +--
drivers/cpufreq/powernow-k8.c | 4 +--
drivers/cpufreq/s3c24xx-cpufreq.c | 4 +--
drivers/cpufreq/sh-cpufreq.c | 4 +--
drivers/cpufreq/unicore2-cpufreq.c | 4 +--
include/linux/cpufreq.h | 12 ++++++---
14 files changed, 76 insertions(+), 36 deletions(-)
--
1.7.12.rc2.18.g61b472e
During suspend, we first stop governors and then suspend cpufreq drivers and
resume must be exactly opposite of that. i.e. resume drivers first and then
start governors.
But the current code in resume enables governors first and then resume drivers.
Fix it be changing code sequence there.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
For 3.15-rc2 ..
drivers/cpufreq/cpufreq.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 3aa7a7a..d8d6bc9 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1652,14 +1652,13 @@ void cpufreq_resume(void)
cpufreq_suspended = false;
list_for_each_entry(policy, &cpufreq_policy_list, policy_list) {
- if (__cpufreq_governor(policy, CPUFREQ_GOV_START)
+ if (cpufreq_driver->resume && cpufreq_driver->resume(policy))
+ pr_err("%s: Failed to resume driver: %p\n", __func__,
+ policy);
+ else if (__cpufreq_governor(policy, CPUFREQ_GOV_START)
|| __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS))
pr_err("%s: Failed to start governor for policy: %p\n",
__func__, policy);
- else if (cpufreq_driver->resume
- && cpufreq_driver->resume(policy))
- pr_err("%s: Failed to resume driver: %p\n", __func__,
- policy);
/*
* schedule call cpufreq_update_policy() for boot CPU, i.e. last
--
1.7.12.rc2.18.g61b472e
On 17 March 2014 21:08, Lukasz Majewski <l.majewski(a)samsung.com> wrote:
>> Despite this patch set is working and applicable on top of 3.14-rc5,
>> please regard it solely as a pure RFC.
>>
>> This patch provides support for LAB governor build on top of ondemand.
>> Previous version of LAB can be found here:
>> http://thread.gmane.org/gmane.linux.kernel/1484746/match=cpufreq
>>
>> LAB short reminder:
>>
>> LAB uses information about how many cores are in "idle" state (the
>> core idleness is represented as the value between 0 and 100) and the
>> overall load of the system (from 0 to 100) to decide about frequency
>> to be set. It is extremely useful with SoCs like Exynos4412, which
>> can set only one frequency for all cores.
>>
>> Important design decisions:
>>
>> - Reuse well established ondemand governor's internal code. To do this
>> I had to expose some previously static internal ondemand code.
>> This allowed smaller LAB code when compared to previous version.
>>
>> - LAB works on top of ondemand, which means that one via device tree
>> attributes can specify if and when e.g. BOOST shall be enabled or
>> if any particular frequency shall be imposed. For situation NOT
>> important from the power consumption reduction viewpoint the ondemand
>> is used to set proper frequency.
>>
>> - It is only possible to either compile in or not the LAB into the
>> kernel. There is no "M" option for Kconfig. It is done on purpose,
>> since ondemand itself can be also compiled as a module and then it
>> would be possible to remove ondemand when LAB is working on top of it.
>>
>> - The LAB operation is specified (and thereof extendable) via device
>> tree lab-ctrl-freq attribute defined at /cpus/cpu0.
>>
>>
>> Problems:
>> - How the governor will work for big.LITTLE systems (especially
>> Global Task Scheduling).
>> - Will there be agreement to expose internal ondemand code to be
>> reused for more specialized governors.
>>
>> Test HW:
>> Exynos4412 - Trats2 board.
>> Above patches were posted on top of Linux 3.14-rc5
>> (SHA1: 3f9590c281c66162bf8ae9b7b2d987f0a89043c6)
>>
>
> Any comments about those patches?
Sorry for being late on reviewing these..
I tried to go through the patches but didn't looked at the minutest
of the details. Its been a long time when you first sent this patchset.
And the memories have corrupted by now :) ..
To get context back, can we discuss again the fundamentals behind
this new governor you are proposing. And then we can discuss about
it again, its pros/cons, etc..
I tried to go to earlier threads but I think we better do it again..
People are reluctant in getting another governor in and want to give
existing governors a try if possible.
So, please explain the basics behind your governor again and then
we can put our arguments again..
--
viresh