Currently the cpuidle drivers are spread across the different archs.
The patch submission for cpuidle are following different path: the cpuidle core
code goes to linux-pm, the ARM drivers goes to arm-soc or the SoC specific
tree, sh goes through sh arch tree, pseries goes through PowerPC and
finally intel goes through Len's tree while acpi_idle goes under linux-pm.
That makes difficult to consolidate the code and to propagate modifications
from the cpuidle core to the different drivers.
Hopefully, a movement has initiated to put the cpuidle drivers into the
drivers/cpuidle directory like cpuidle-calxeda.c and cpuidle-kirkwood.c
Add an explicit maintainer entry in the MAINTAINER to clarify the situation
and prevent new cpuidle drivers to goes to an arch directory.
The upstreaming process is unchanged: Rafael takes the patches to merge them
into its tree but with the acked-by from the driver's maintainer. So the header
must contains the name of the maintainer.
This organization will be the same than cpufreq.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
---
MAINTAINERS | 7 +++++++
drivers/cpuidle/cpuidle-calxeda.c | 4 +++-
drivers/cpuidle/cpuidle-kirkwood.c | 5 +++--
3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 61677c3..effa0f3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2206,6 +2206,13 @@ S: Maintained
F: drivers/cpufreq/
F: include/linux/cpufreq.h
+CPUIDLE DRIVERS
+M: Rafael J. Wysocki <rjw(a)sisk.pl>
+L: linux-pm(a)vger.kernel.org
+S: Maintained
+F: drivers/cpuidle/*
+F: include/linux/cpuidle.h
+
CPU FREQUENCY DRIVERS - ARM BIG LITTLE
M: Viresh Kumar <viresh.kumar(a)linaro.org>
M: Sudeep KarkadaNagesha <sudeep.karkadanagesha(a)arm.com>
diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
index e344b56..2378c39 100644
--- a/drivers/cpuidle/cpuidle-calxeda.c
+++ b/drivers/cpuidle/cpuidle-calxeda.c
@@ -1,7 +1,6 @@
/*
* Copyright 2012 Calxeda, Inc.
*
- * Based on arch/arm/plat-mxc/cpuidle.c:
* Copyright 2012 Freescale Semiconductor, Inc.
* Copyright 2012 Linaro Ltd.
*
@@ -16,6 +15,9 @@
*
* You should have received a copy of the GNU General Public License along with
* this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author : Rob Herring <rob.herring(a)calxeda.com>
+ * Maintainer: Rob Herring <rob.herring(a)calxeda.com>
*/
#include <linux/cpuidle.h>
diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
index 53290e1..521b0a7 100644
--- a/drivers/cpuidle/cpuidle-kirkwood.c
+++ b/drivers/cpuidle/cpuidle-kirkwood.c
@@ -1,6 +1,4 @@
/*
- * arch/arm/mach-kirkwood/cpuidle.c
- *
* CPU idle Marvell Kirkwood SoCs
*
* This file is licensed under the terms of the GNU General Public
@@ -11,6 +9,9 @@
* to implement two idle states -
* #1 wait-for-interrupt
* #2 wait-for-interrupt and DDR self refresh
+ *
+ * Maintainer: Jason Cooper <jason(a)lakedaemon.net>
+ * Maintainer: Andrew Lunn <andrew(a)lunn.ch>
*/
#include <linux/kernel.h>
--
1.7.9.5
On 25 April 2013 08:16, Tang Yuantian-B29983 <B29983(a)freescale.com> wrote:
> It happened when policy->cpus contains *MORE THEN ONE CPU*.
> Taking my board T4240 for example, it has 3 cluster, 8 CPUs for each cluster.
> The log is:
> # insmod ppc-corenet-cpufreq.ko
> ppc_corenet_cpufreq: Freescale PowerPC corenet CPU frequency scaling driver
> # rmmod ppc-corenet-cpufreq.ko
> ERROR: Module ppc_corenet_cpufreq is in use
> # lsmod
> Module Size Used by
> ppc_corenet_cpufreq 6542 9
> # uname -a
> Linux T4240 3.9.0-rc1-11081-g34642bb-dirty #44 SMP Thu Apr 25 08:58:26 CST 2013 ppc64 unknown
>
> I am not using the newest kernel (since new t4240 board has not included yet),
> but the issue is still there.
> The reason is just like what I said in patch.
I believed what you said is correct and went on testing this on my platform.
2 clusters with 2 and 3 cpus... And so i have multiple cpus per
cluster or policy
structure.
insmod/rmmod worked as expected without any issues.
So, for me there are no such issues. BTW, i tested this on latest rc from Linus
and also on latest code from linux-next.
I am sure the counts are very well balanced and there are no issues in the
latest code Atleast.
On my smp platform which is made of 5 cores in 2 clusters, I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause is:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
More generally, the NOHZ_IDLE flag must be initialized when new sched_domains
are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned.
This condition can be ensured by adding a synchronize_rcu between the
destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE
flag will not be updated with old sched_domain once it has been initialized.
But this solution introduces a additionnal latency in the rebuild sequence
that is called during cpu hotplug.
As suggested by Frederic Weisbecker, another solution is to have the same
rcu lifecycle for both NOHZ_IDLE and sched_domain struct.
A new nohz_idle field is added to sched_domain so both status and sched_domain
will share the same RCU lifecycle and will be always synchronized.
In addition, there is no more need to protect nohz_idle against concurrent
access as it is only modified by 2 exclusive functions called by local cpu.
This solution has been prefered to the creation of a new struct with an extra
pointer indirection for sched_domain.
The synchronization is done at the cost of :
- An additional indirection and a rcu_dereference for accessing nohz_idle.
- We use only the nohz_idle field of the top sched_domain.
Change since v7:
- remove atomic access which is useless now.
- refactor the sequence that update nohz_idle status and nr_busy_cpus.
Change since v6:
- Add the flags in struct sched_domain instead of creating a sched_domain_rq.
Change since v5:
- minor variable and function name change.
- remove a useless null check before kfree
- fix a compilation error when NO_HZ is not set.
Change since v4:
- link both sched_domain and NOHZ_IDLE flag in one RCU object so
their states are always synchronized.
Change since V3;
- NOHZ flag is not cleared if a NULL domain is attached to the CPU
- Remove patch 2/2 which becomes useless with latest modifications
Change since V2:
- change the initialization to idle state instead of busy state so a CPU that
enters idle during the build of the sched_domain will not corrupt the
initialization state
Change since V1:
- remove the patch for SCHED softirq on an idle core use case as it was
a side effect of the other use cases.
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
include/linux/sched.h | 3 +++
kernel/sched/fair.c | 26 ++++++++++++++++----------
kernel/sched/sched.h | 1 -
3 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..22bcbe8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -899,6 +899,9 @@ struct sched_domain {
unsigned int wake_idx;
unsigned int forkexec_idx;
unsigned int smt_gain;
+#ifdef CONFIG_NO_HZ
+ int nohz_idle; /* NOHZ IDLE status */
+#endif
int flags; /* See SD_* */
int level;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..5db1817 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5395,13 +5395,16 @@ static inline void set_cpu_sd_state_busy(void)
struct sched_domain *sd;
int cpu = smp_processor_id();
- if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
- return;
- clear_bit(NOHZ_IDLE, nohz_flags(cpu));
-
rcu_read_lock();
- for_each_domain(cpu, sd)
+ sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
+
+ if (!sd || !sd->nohz_idle)
+ goto unlock;
+ sd->nohz_idle = 0;
+
+ for (; sd; sd = sd->parent)
atomic_inc(&sd->groups->sgp->nr_busy_cpus);
+unlock:
rcu_read_unlock();
}
@@ -5410,13 +5413,16 @@ void set_cpu_sd_state_idle(void)
struct sched_domain *sd;
int cpu = smp_processor_id();
- if (test_bit(NOHZ_IDLE, nohz_flags(cpu)))
- return;
- set_bit(NOHZ_IDLE, nohz_flags(cpu));
-
rcu_read_lock();
- for_each_domain(cpu, sd)
+ sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
+
+ if (!sd || sd->nohz_idle)
+ goto unlock;
+ sd->nohz_idle = 1;
+
+ for (; sd; sd = sd->parent)
atomic_dec(&sd->groups->sgp->nr_busy_cpus);
+unlock:
rcu_read_unlock();
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cc03cfd..03b13c8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1187,7 +1187,6 @@ extern void account_cfs_bandwidth_used(int enabled, int was_enabled);
enum rq_nohz_flag_bits {
NOHZ_TICK_STOPPED,
NOHZ_BALANCE_KICK,
- NOHZ_IDLE,
};
#define nohz_flags(cpu) (&cpu_rq(cpu)->nohz_flags)
--
1.7.9.5
Hi,
Working as a newbie in the PMWG, I noticed I'm not able to resume my
pandaboard-es with the latest 3.9 kernel from Linus (configuration file
omap2plus_defconfig). Suspend/resume appears to work with the Linaro 12.11
release; I managed to wake it up with a USB keyboard. There is also
launchpad bug 989547 that is still open. Any updates on this issue?
Thanks,
Zoran
This patchset was called: "Create sched_select_cpu() and use it for workqueues"
for the first three versions.
Earlier discussions over v3, v2 and v1 can be found here:
https://lkml.org/lkml/2013/3/18/364http://lists.linaro.org/pipermail/linaro-dev/2012-November/014344.htmlhttp://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
For power saving it is better to schedule work on cpus that aren't idle, as
bringing a cpu/cluster from idle state can be very costly (both performance and
power wise). Earlier we tried to use timer infrastructure to take this decision
but we found out later that scheduler gives even better results and so we should
use scheduler for choosing cpu for scheduling work.
In workqueue subsystem workqueues with flag WQ_UNBOUND are the ones which uses
cpu to select target cpu.
Here we are migrating few users of workqueues to WQ_UNBOUND. These drivers are
found to be very much active on idle or lightly busy system and using WQ_UNBOUND
for these gave impressive results.
Setup:
-----
- ARM Vexpress TC2 - big.LITTLE CPU
- Core 0-1: A15, 2-4: A7
- rootfs: linaro-ubuntu-devel
This patchset has been tested on a big LITTLE system (heterogeneous) but is
useful for all other homogeneous systems as well. During these tests audio was
played in background using aplay.
Results:
-------
Cluster A15 Energy Cluster A7 Energy Total
------------------------- ----------------------- ------
Without this patchset (Energy in Joules):
---------------------------------------------------
0.151162 2.183545 2.334707
0.223730 2.687067 2.910797
0.289687 2.732702 3.022389
0.454198 2.745908 3.200106
0.495552 2.746465 3.242017
Average:
0.322866 2.619137 2.942003
With this patchset (Energy in Joules):
-----------------------------------------------
0.226421 2.283658 2.510079
0.151361 2.236656 2.388017
0.197726 2.249849 2.447575
0.221915 2.229446 2.451361
0.347098 2.257707 2.604805
Average:
0.2289042 2.2514632 2.4803674
Above tests are repeated multiple times and events are tracked using trace-cmd
and analysed using kernelshark. And it was easily noticeable that idle time for
many cpus has increased considerably, which eventually saved some power.
PS: All the earlier Acks we got for drivers are reverted here as patches have
been updated significantly.
V3->V4:
-------
- Dropped changes to kernel/sched directory and hence
sched_select_non_idle_cpu().
- Dropped queue_work_on_any_cpu()
- Created system_freezable_unbound_wq
- Changed all patches accordingly.
V2->V3:
-------
- Dropped changes into core queue_work() API, rather create *_on_any_cpu()
APIs
- Dropped running timers migration patch as that was broken
- Migrated few users of workqueues to use *_on_any_cpu() APIs.
Viresh Kumar (4):
workqueue: Add system wide system_freezable_unbound_wq
PHYLIB: queue work on unbound wq
block: queue work on unbound wq
fbcon: queue work on unbound wq
block/blk-core.c | 3 ++-
block/blk-ioc.c | 2 +-
block/genhd.c | 10 ++++++----
drivers/net/phy/phy.c | 9 +++++----
drivers/video/console/fbcon.c | 2 +-
include/linux/workqueue.h | 4 ++++
kernel/workqueue.c | 7 ++++++-
7 files changed, 25 insertions(+), 12 deletions(-)
--
1.7.12.rc2.18.g61b472e
This patchset series provide some code consolidation across the different
cpuidle drivers. It contains two parts, the first one is the removal of
the time keeping flag and the second one, is a common initialization routine.
All the drivers use the en_core_tk_irqen flag, which means it is not necessary
to make the time computation optional. We can remove this flag and assume the
cpuidle framework always manage this operation.
The cpuidle code initialization is duplicated across the different drivers in
the same manner.
The repeating pattern is:
SMP:
cpuidle_register_driver(drv);
for_each_possible_cpu(cpu) {
dev = per_cpu(cpuidle_device, cpu);
cpuidle_register_device(dev);
}
UP:
cpuidle_register_driver(drv);
cpuidle_register_device(dev);
As on a UP machine the macro 'for_each_cpu' is a one iteration loop, using the
initialization loop from SMP to UP works.
The patchset does some cleanup for different drivers in order to make the init
code the same. Then it introduces a generic function:
cpuidle_register(struct cpuidle_driver *drv, struct cpumask *cpumask)
The cpumask is for the coupled idle states.
The drivers are then modified to take into account this new function and
to remove the duplicated code.
The benefit is observable in the diffstat: 332 lines of code removed.
Changelog:
- V3:
* folded patch 5/19 into 19/19, they were:
* ARM: imx: cpuidle: use init/exit common routine
* ARM: imx: cpuidle: create separate drivers for imx5/imx6
* removed rule to make cpuidle.o in the imx's Makefile
* splitted patch 1/19 into two, they are:
* [V3 patch 01/19] ARM: shmobile: cpuidle: remove shmobile_enter_wfi
* [V3 patch 02/19] ARM: shmobile: cpuidle: remove shmobile_enter_wfi prototype
- V2:
* fixed cpumask NULL test for coupled state in cpuidle_register
* added comment about structure copy
* changed printk by pr_err
* folded splitted message
* fixed return code in cpuidle_register
* updated Documentation/cpuidle/drivers.txt
* added in the changelog dev->state_count is filled by cpuidle_enable_device
* fixed tag for tegra in the first line patch description
* fixed tegra2 removed tegra_tear_down_cpu = tegra20_tear_down_cpu;
- V1: Initial post
Tested-on: u8500
Tested-on: at91
Tested-on: intel i5
Tested-on: OMAP4
Compiled with and without CPU_IDLE for:
u8500, at91, davinci, exynos, imx5, imx6, kirkwood, multi_v7 (for calxeda),
omap2plus, s3c64, tegra1, tegra2, tegra3
Daniel Lezcano (19):
ARM: shmobile: cpuidle: remove shmobile_enter_wfi function
ARM: shmobile: cpuidle: remove shmobile_enter_wfi prototype
ARM: OMAP3: remove cpuidle_wrap_enter
cpuidle: remove en_core_tk_irqen flag
ARM: ux500: cpuidle: replace for_each_online_cpu by
for_each_possible_cpu
cpuidle: make a single register function for all
ARM: ux500: cpuidle: use init/exit common routine
ARM: at91: cpuidle: use init/exit common routine
ARM: OMAP3: cpuidle: use init/exit common routine
ARM: s3c64xx: cpuidle: use init/exit common routine
ARM: tegra: cpuidle: use init/exit common routine
ARM: shmobile: cpuidle: use init/exit common routine
ARM: OMAP4: cpuidle: use init/exit common routine
ARM: tegra: cpuidle: use init/exit common routine for tegra2
ARM: tegra: cpuidle: use init/exit common routine for tegra3
ARM: calxeda: cpuidle: use init/exit common routine
ARM: kirkwood: cpuidle: use init/exit common routine
ARM: davinci: cpuidle: use init/exit common routine
ARM: imx: cpuidle: use init/exit common routine
Documentation/cpuidle/driver.txt | 6 +
arch/arm/mach-at91/cpuidle.c | 18 +--
arch/arm/mach-davinci/cpuidle.c | 21 +---
arch/arm/mach-exynos/cpuidle.c | 1 -
arch/arm/mach-imx/Makefile | 2 +-
arch/arm/mach-imx/cpuidle-imx5.c | 40 +++++++
arch/arm/mach-imx/cpuidle-imx6q.c | 3 +-
arch/arm/mach-imx/cpuidle.c | 80 -------------
arch/arm/mach-imx/cpuidle.h | 10 +-
arch/arm/mach-imx/pm-imx5.c | 30 +----
arch/arm/mach-omap2/cpuidle34xx.c | 49 ++------
arch/arm/mach-omap2/cpuidle44xx.c | 23 +---
arch/arm/mach-s3c64xx/cpuidle.c | 15 +--
arch/arm/mach-shmobile/cpuidle.c | 11 +-
arch/arm/mach-shmobile/include/mach/common.h | 3 -
arch/arm/mach-shmobile/pm-sh7372.c | 2 -
arch/arm/mach-tegra/cpuidle-tegra114.c | 27 +----
arch/arm/mach-tegra/cpuidle-tegra20.c | 31 +----
arch/arm/mach-tegra/cpuidle-tegra30.c | 28 +----
arch/arm/mach-ux500/cpuidle.c | 33 +-----
arch/powerpc/platforms/pseries/processor_idle.c | 1 -
arch/sh/kernel/cpu/shmobile/cpuidle.c | 1 -
arch/x86/kernel/apm_32.c | 1 -
drivers/acpi/processor_idle.c | 1 -
drivers/cpuidle/cpuidle-calxeda.c | 53 +--------
drivers/cpuidle/cpuidle-kirkwood.c | 18 +--
drivers/cpuidle/cpuidle.c | 144 ++++++++++++++---------
drivers/idle/intel_idle.c | 1 -
include/linux/cpuidle.h | 20 ++--
29 files changed, 175 insertions(+), 498 deletions(-)
create mode 100644 arch/arm/mach-imx/cpuidle-imx5.c
delete mode 100644 arch/arm/mach-imx/cpuidle.c
--
1.7.9.5
While migrating to common clock framework (CCF), I found that the FIMD clocks
were pulled down by the CCF.
If CCF finds any clock(s) which has NOT been claimed by any of the
drivers, then such clock(s) are PULLed low by CCF.
Calling clk_prepare() for FIMD clocks fixes the issue.
This patch also replaces clk_disable() with clk_unprepare() during exit, since
clk_prepare() is called in fimd_probe().
Signed-off-by: Vikas Sajjan <vikas.sajjan(a)linaro.org>
---
Changes since v3:
- added clk_prepare() in fimd_probe() and clk_unprepare() in fimd_remove()
as suggested by Viresh Kumar <viresh.kumar(a)linaro.org>
Changes since v2:
- moved clk_prepare_enable() and clk_disable_unprepare() from
fimd_probe() to fimd_clock() as suggested by Inki Dae <inki.dae(a)samsung.com>
Changes since v1:
- added error checking for clk_prepare_enable() and also replaced
clk_disable() with clk_disable_unprepare() during exit.
---
drivers/gpu/drm/exynos/exynos_drm_fimd.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
index 9537761..aa22370 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
@@ -934,6 +934,16 @@ static int fimd_probe(struct platform_device *pdev)
return ret;
}
+ ret = clk_prepare(ctx->bus_clk);
+ if (ret < 0)
+ return ret;
+
+ ret = clk_prepare(ctx->lcd_clk);
+ if (ret < 0) {
+ clk_unprepare(ctx->bus_clk);
+ return ret;
+ }
+
ctx->vidcon0 = pdata->vidcon0;
ctx->vidcon1 = pdata->vidcon1;
ctx->default_win = pdata->default_win;
@@ -981,8 +991,8 @@ static int fimd_remove(struct platform_device *pdev)
if (ctx->suspended)
goto out;
- clk_disable(ctx->lcd_clk);
- clk_disable(ctx->bus_clk);
+ clk_unprepare(ctx->lcd_clk);
+ clk_unprepare(ctx->bus_clk);
pm_runtime_set_suspended(dev);
pm_runtime_put_sync(dev);
--
1.7.9.5
hi Nico & all,
We are do some profiling on TC2 board for low power mode, and found
there have some long latency for the core/cluster's power on sequence,
so want to confirm below questions:
1. From our profiling result, we found if the core_A send IPI to core_B
and the core_B run into the function bL_entry_point (or the function
mcpm_entry_point in your later patches for mainline) will take about
954us, it's really a long interval.
Now we use the firmware is 13.01's version (with has supported BX_ADDRx
registers); so the cluster level's power on sequence should be:
a) DCC to detect the nIRQOUT/nFIQOUT asserting;
b) DCC power on the according cluster;
c) the core run into boot monitor code and finally it will use the
BX_ADDRx register to jump to the function *bL_entry_point*;
Due upper flows are black box for us, so we suspect the time will be
consumed by one of these steps; could u or ARM guys can help confirm
this question?
2. When we read the spec DAI0318D_v2p_ca15_a7_power_management.pdf and
get confirm from ARM support, we know there only have cluster level's
power down with CA15_PWRDN_EN/CA7_PWRDN_EN bits.
For the core level, we can NOT independently to power off the core if
other cores in the same cluster are still powered on. But this is
conflicting with TC2's power management code in tc2_pm.c.
We can see in the function *tc2_pm_down()*, it will call
gic_cpu_if_down() to disable GIC's cpu interface; that means the core
cannot receive interrupts anymore and the core will run into WFI. After
the core run into WFI, if DCC/SPC detect there have interrupts from
GIC's nIRQOUT/nFIQOUT pins, then the DCC/SPC will power on the core (or
reset the core) to let the core to resume back, then s/w need enable the
GIC's cpu interface for itself.
Here the questions are:
a) in the function *tc2_pm_down()*, after the core run into WFI state,
though DCC/SPC cannot power off the core if the core is NOT the last man
of the cluster, but DCC/SPC will reset the core, right?
b) how DCC/SPC decide the core is want to run into C1 state or only
"WFI" state? DCC/SPC will use the WAKE_INT_MASK bits as the flag?
--
Thx,
Leo Yan