This patchset series provide some code consolidation across the different
cpuidle drivers. It contains two parts, the first one is the removal of
the time keeping flag and the second one, is a common initialization routine.
All the drivers use the en_core_tk_irqen flag, which means it is not necessary
to make the time computation optional. We can remove this flag and assume the
cpuidle framework always manage this operation.
The cpuidle code initialization is duplicated across the different drivers in
the same manner.
The repeating pattern is:
SMP:
cpuidle_register_driver(drv);
for_each_possible_cpu(cpu) {
dev = per_cpu(cpuidle_device, cpu);
cpuidle_register_device(dev);
}
UP:
cpuidle_register_driver(drv);
cpuidle_register_device(dev);
As on a UP machine the macro 'for_each_cpu' is a one iteration loop, using the
initialization loop from SMP to UP works.
The patchset does some cleanup for different drivers in order to make the init
code the same. Then it introduces a generic function:
cpuidle_register(struct cpuidle_driver *drv, struct cpumask *cpumask)
The cpumask is for the coupled idle states.
The drivers are then modified to take into account this new function and
to remove the duplicated code.
The benefit is observable in the diffstat: 332 lines of code removed.
Changelog:
- V4:
* Added the different Acked-by and Reviewed-by
* Removed the patches already merged:
* ARM: shmobile: cpuidle: remove shmobile_enter_wfi function
* ARM: shmobile: cpuidle: remove shmobile_enter_wfi prototype
* ARM: OMAP3: remove cpuidle_wrap_enter
* Removed patch without acked-by (no answer from the maintainer)
* ARM: s3c64xx: cpuidle: use init/exit common routine
- V3:
* folded patch 5/19 into 19/19, they were:
* ARM: imx: cpuidle: use init/exit common routine
* ARM: imx: cpuidle: create separate drivers for imx5/imx6
* removed rule to make cpuidle.o in the imx's Makefile
* splitted patch 1/19 into two, they are:
* [V3 patch 01/19] ARM: shmobile: cpuidle: remove shmobile_enter_wfi
* [V3 patch 02/19] ARM: shmobile: cpuidle: remove shmobile_enter_wfi prototype
- V2:
* fixed cpumask NULL test for coupled state in cpuidle_register
* added comment about structure copy
* changed printk by pr_err
* folded splitted message
* fixed return code in cpuidle_register
* updated Documentation/cpuidle/drivers.txt
* added in the changelog dev->state_count is filled by cpuidle_enable_device
* fixed tag for tegra in the first line patch description
* fixed tegra2 removed tegra_tear_down_cpu = tegra20_tear_down_cpu;
- V1: Initial post
Tested-on: u8500
Tested-on: at91
Tested-on: intel i5
Tested-on: OMAP4
Tested-by: Kevin Hilman <khilman(a)linaro.org> # OMAP3, OMAP4
Tested-by: Andrew Lunn <andrew(a)lunn.ch> # Kirkwood
Compiled with and without CPU_IDLE for:
u8500, at91, davinci, exynos, imx5, imx6, kirkwood, multi_v7 (for calxeda),
omap2plus, s3c64, tegra1, tegra2, tegra3
Daniel Lezcano (15):
cpuidle: remove en_core_tk_irqen flag
ARM: ux500: cpuidle: replace for_each_online_cpu by
for_each_possible_cpu
cpuidle: make a single register function for all
ARM: ux500: cpuidle: use init/exit common routine
ARM: at91: cpuidle: use init/exit common routine
ARM: OMAP3: cpuidle: use init/exit common routine
ARM: tegra: cpuidle: use init/exit common routine
ARM: shmobile: cpuidle: use init/exit common routine
ARM: OMAP4: cpuidle: use init/exit common routine
ARM: tegra: cpuidle: use init/exit common routine for tegra2
ARM: tegra: cpuidle: use init/exit common routine for tegra3
ARM: calxeda: cpuidle: use init/exit common routine
ARM: kirkwood: cpuidle: use init/exit common routine
ARM: davinci: cpuidle: use init/exit common routine
ARM: imx: cpuidle: use init/exit common routine
Documentation/cpuidle/driver.txt | 6 +
arch/arm/mach-at91/cpuidle.c | 18 +--
arch/arm/mach-davinci/cpuidle.c | 21 +---
arch/arm/mach-exynos/cpuidle.c | 1 -
arch/arm/mach-imx/Makefile | 2 +-
arch/arm/mach-imx/cpuidle-imx5.c | 37 ++++++
arch/arm/mach-imx/cpuidle-imx6q.c | 3 +-
arch/arm/mach-imx/cpuidle.c | 80 -------------
arch/arm/mach-imx/cpuidle.h | 10 +-
arch/arm/mach-imx/pm-imx5.c | 30 +----
arch/arm/mach-omap2/cpuidle34xx.c | 23 +---
arch/arm/mach-omap2/cpuidle44xx.c | 27 +----
arch/arm/mach-s3c64xx/cpuidle.c | 1 -
arch/arm/mach-shmobile/cpuidle.c | 11 +-
arch/arm/mach-shmobile/pm-sh7372.c | 1 -
arch/arm/mach-tegra/cpuidle-tegra114.c | 27 +----
arch/arm/mach-tegra/cpuidle-tegra20.c | 31 +----
arch/arm/mach-tegra/cpuidle-tegra30.c | 28 +----
arch/arm/mach-ux500/cpuidle.c | 33 +-----
arch/powerpc/platforms/pseries/processor_idle.c | 1 -
arch/sh/kernel/cpu/shmobile/cpuidle.c | 1 -
arch/x86/kernel/apm_32.c | 1 -
drivers/acpi/processor_idle.c | 1 -
drivers/cpuidle/cpuidle-calxeda.c | 53 +--------
drivers/cpuidle/cpuidle-kirkwood.c | 18 +--
drivers/cpuidle/cpuidle.c | 144 ++++++++++++++---------
drivers/idle/intel_idle.c | 1 -
include/linux/cpuidle.h | 20 ++--
28 files changed, 162 insertions(+), 468 deletions(-)
create mode 100644 arch/arm/mach-imx/cpuidle-imx5.c
delete mode 100644 arch/arm/mach-imx/cpuidle.c
--
1.7.9.5
commit d1669912 (idle: Implement generic idle function) added a new
generic idle along with support for hlt/nohlt command line options to
override default idle loop behavior. However, the command-line
processing is never compiled.
The command-line handling is wrapped by CONFIG_GENERIC_IDLE_POLL_SETUP
and arches that use this feature select it in their Kconfigs.
However, no Kconfig definition was created for this option, so it is
never enabled, and therefore command-line override of the idle-loop
behavior is broken after migrating to the generic idle loop.
To fix, add a Kconfig definition for GENERIC_IDLE_POLL_SETUP.
Tested on ARM (OMAP4/Panda) which enables the command-line overrides
by default.
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Rusty Russell <rusty(a)rustcorp.com.au>
Cc: Paul McKenney <paulmck(a)linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat(a)linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm(a)gmail.com>
Signed-off-by: Kevin Hilman <khilman(a)linaro.org>
---
Applies on tip/smp/hotplug where generic idle feature is added
arch/Kconfig | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 1455579..e0ef57b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -216,6 +216,9 @@ config USE_GENERIC_SMP_HELPERS
config GENERIC_SMP_IDLE_THREAD
bool
+config GENERIC_IDLE_POLL_SETUP
+ bool
+
# Select if arch init_task initializer is different to init/init_task.c
config ARCH_INIT_TASK
bool
--
1.8.2
I tested Linaro 13.04 kernel on my Exynos5250 and found that USB device is
not found (i.e. nothing in lsusb). It used to work with Linaro 13.01. dmesg
showed that s3c-usbphy can't configure phy mode (see below). Looking at the
samsung-usbphy.c file, it looks like sysreg is not defined in device tree
file. I am thinking that we are missing one parameter for usbphy-sys in the
exynos5250.dtsi, by comparing a patch from
http://comments.gmane.org/gmane.linux.usb.general/82597.
Any comment?
usbphy {
#address-cells = <1>;
#size-cells = <1>;
compatible = "samsung,exynos5250-usbphy";
reg = <0x12130000 0x100>, <0x12100000 0x100>;
ranges;
usbphy-sys {
/* USB device and host PHY_CONTROL registers */
reg = <0x10040704 0x8>;
};
};
-Wei
=== dmesg ===
samsung-usbphy s3c-usbphy: Can't get usb-phy sysreg cfg register
samsung-usbphy s3c-usbphy: Can't configure specified phy mode
samsung-usbphy s3c-usbphy: Can't configure specified phy mode
samsung-usbphy s3c-usbphy: Already power on PHY
samsung-usbphy s3c-usbphy: Can't configure specified phy mode
samsung-usbphy s3c-usbphy: Already power on PHY
=== code snip from samsung-usbphy.c ===
/*
* Not returning error code here, since this situation is not fatal.
* Few SoCs may not have this switch available
*/
if (sphy->sysreg == NULL)
dev_warn(sphy->dev, "Can't get usb-phy sysreg cfg
register\n");
hi Nico & all,
Per Samsung has released their big.LITTLE phone, and the IKS code has
been release in its kernel's source code; so may be now it's good time
for us to know status of IKS from Linaro side:
1. What's the plan for Lianro to release iks related patches?
Due now mcpm related patches are pending for mainline's merge,
unfortunately iks is based on the mcpm patches; so when will u send iks
patches to open source's mailing list?
On the other hand, do u have plan to merge iks related patches firstly
to Lianro ARM landing team's branch? if so, then it will not depend on
mainline and can be much more efficient.
2. Looked into Samsung's code, there have the cluster switching which is
based on iks (named ikcs); so now in your code base, do u have
implemented related functionality?
--
Thx,
Leo Yan
== Linus Walleij linusw ==
=== Highlights ===
* Sent a last minute revert from the GPIO tree to Torvalds and
he pulled it in for the final v3.9 kernel.
* Spent something like a working day reviewing and commenting
on the DMA 40 patches for DMA migration to Device Tree.
* Reviewed USB DT patches.
* Last minute additions and fixups in the pinctrl and GPIO tree,
some of it will probably not go in at the first pull request,
possibly I'll hold it back until v3.11 even. This includes a patch
to make the pinctrl mutex locking more fine-grained.
* Iterated U300 DT patches as a prerequisite for multiplatform
work on the U300.
=== Plans ===
* A short paternity leave 6/5->9/5 in may.
As noted elsewhere: the child is not newborn: she is 6 years
old, but we can stash this leave...
* Find all regressions for ux500 lurking in the linux-next tree.
* Convert Nomadik pinctrl driver to register GPIO ranges
from the gpiochip side.
* Test the PL08x patches on the Ericsson Research
PB11MPCore and submit platform data for using
pl08x DMA on that platform.
* Get hands dirty with regmap.
=== Issues ===
* Things have been hectic internally at ST-Ericsson diverting me
from Linaro work.
* I am spending roughly 30-60 mins every day on internal review
work on internal baseline and mainline patches-to-be.
Thanks,
Linus Walleij
=== David Long ===
=== Highlights ===
* I have a (mostly) working version of reorganized ARM uprobe support. I
have just given Tixy a monolithic patch in the hopes he can tell me if
my approach makes sense. I am continuing to clean up the code and
prepare it for a wider review.
* Completed travel arrangements for Dublin.
=== Plans ===
* Continue with uprobe/kprobe
* Start building systemtap
=== Issues ===
* None
-dl
=== Highlights ===
* Summarized the volatile ranges discussion I ran at lsf-mm:
http://permalink.gmane.org/gmane.linux.kernel.mm/98848
* The lsf-mm volatile ranges discussion was briefly covered by lwn:
https://lwn.net/Articles/548108/
* Reviewed DmitryP's netfilter idletimer patches
* Met with Zach and Karim for LPC Android minisummit planning
* Reviewed blueprints and held bi-weekly upstreaming hangout
* Discussed RTC vs persistent_clock confusion and issues on lkml
* Worked with Zoran on suspend/resume issue & general git/community
process stuff.
* Discussed DmitryP's thought of using Gerrit for Linaro test development
* Updated linaro.android tree to AOSP's -rc7 branch, but reverted when
Tixy saw some issues
* Worked with Tixy to get his cpufreq fix integrated into the
linaro-fixes branch and pushed upstream to ASOP
* Discussed ION build issues w/ Jessee Barker
* Worked on rebasing and reworking Minchan and my volatile ranges
patches so they are more coherant and unified.
=== Plans ===
* Continue reworking the volatile ranges patchset and send to lkml
* Review tglx's clocksource unregister patches
* More LPC minisummit planning
* Probably more ION research
=== Issues ===
* NA
With prior discussions (Over private emails) with current Maintainer of cpufreq
framework (Rafael), I am adding myself as a co-maintainer of cpufreq framework.
This would mostly be for cpufreq core and ARM drivers but not restricted to
them.
This also adds path of the git tree where cpufreq patches are pulled in.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
V1->V2:
- Added path of git tree too.
- Cc'd ARM SoC Maintainers.
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 68d376e..cbed63c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2211,9 +2211,11 @@ F: drivers/net/ethernet/ti/cpmac.c
CPU FREQUENCY DRIVERS
M: Rafael J. Wysocki <rjw(a)sisk.pl>
+M: Viresh Kumar <viresh.kumar(a)linaro.org>
L: cpufreq(a)vger.kernel.org
L: linux-pm(a)vger.kernel.org
S: Maintained
+T: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
F: drivers/cpufreq/
F: include/linux/cpufreq.h
--
1.7.12.rc2.18.g61b472e
Currently the cpuidle drivers are spread across the different archs.
The patch submission for cpuidle are following different path: the cpuidle core
code goes to linux-pm, the ARM drivers goes to arm-soc or the SoC specific
tree, sh goes through sh arch tree, pseries goes through PowerPC and
finally intel goes through Len's tree while acpi_idle goes under linux-pm.
That makes difficult to consolidate the code and to propagate modifications
from the cpuidle core to the different drivers.
Hopefully, a movement has initiated to put the cpuidle drivers into the
drivers/cpuidle directory like cpuidle-calxeda.c and cpuidle-kirkwood.c
Add an explicit maintainer entry in the MAINTAINER to clarify the situation
and prevent new cpuidle drivers to goes to an arch directory.
The upstreaming process is unchanged: Rafael takes the patches to merge them
into its tree but with the acked-by from the driver's maintainer. So the header
must contains the name of the maintainer.
This organization will be the same than cpufreq.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Acked-by: Linus Walleij <linus.walleij(a)linaro.org>
Acked-by: Andrew Lunn <andrew(a)lunn.ch> #for kirkwood
Acked-by: Jason Cooper <jason(a)lakedaemon.net> #for kirkwood
---
MAINTAINERS | 9 +++++++++
drivers/cpuidle/cpuidle-calxeda.c | 4 +++-
drivers/cpuidle/cpuidle-kirkwood.c | 5 +++--
3 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 61677c3..45ee6dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2217,6 +2217,15 @@ F: drivers/cpufreq/arm_big_little.h
F: drivers/cpufreq/arm_big_little.c
F: drivers/cpufreq/arm_big_little_dt.c
+CPUIDLE DRIVERS
+M: Rafael J. Wysocki <rjw(a)sisk.pl>
+M: Daniel Lezcano <daniel.lezcano(a)linaro.org>
+L: linux-pm(a)vger.kernel.org
+S: Maintained
+T: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
+F: drivers/cpuidle/*
+F: include/linux/cpuidle.h
+
CPUID/MSR DRIVER
M: "H. Peter Anvin" <hpa(a)zytor.com>
S: Maintained
diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
index e344b56..2233791 100644
--- a/drivers/cpuidle/cpuidle-calxeda.c
+++ b/drivers/cpuidle/cpuidle-calxeda.c
@@ -1,7 +1,7 @@
/*
* Copyright 2012 Calxeda, Inc.
*
- * Based on arch/arm/plat-mxc/cpuidle.c:
+ * Based on arch/arm/plat-mxc/cpuidle.c: #v3.7
* Copyright 2012 Freescale Semiconductor, Inc.
* Copyright 2012 Linaro Ltd.
*
@@ -16,6 +16,8 @@
*
* You should have received a copy of the GNU General Public License along with
* this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Maintainer: Rob Herring <rob.herring(a)calxeda.com>
*/
#include <linux/cpuidle.h>
diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
index 53290e1..521b0a7 100644
--- a/drivers/cpuidle/cpuidle-kirkwood.c
+++ b/drivers/cpuidle/cpuidle-kirkwood.c
@@ -1,6 +1,4 @@
/*
- * arch/arm/mach-kirkwood/cpuidle.c
- *
* CPU idle Marvell Kirkwood SoCs
*
* This file is licensed under the terms of the GNU General Public
@@ -11,6 +9,9 @@
* to implement two idle states -
* #1 wait-for-interrupt
* #2 wait-for-interrupt and DDR self refresh
+ *
+ * Maintainer: Jason Cooper <jason(a)lakedaemon.net>
+ * Maintainer: Andrew Lunn <andrew(a)lunn.ch>
*/
#include <linux/kernel.h>
--
1.7.9.5
With prior discussions (Over private emails) with current Maintainer of cpufreq
framework (Rafael), I am adding myself as a co-maintainer of cpufreq framework.
This would mostly be for cpufreq core and ARM drivers but not restricted to
them.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 68d376e..bcef513 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2211,6 +2211,7 @@ F: drivers/net/ethernet/ti/cpmac.c
CPU FREQUENCY DRIVERS
M: Rafael J. Wysocki <rjw(a)sisk.pl>
+M: Viresh Kumar <viresh.kumar(a)linaro.org>
L: cpufreq(a)vger.kernel.org
L: linux-pm(a)vger.kernel.org
S: Maintained
--
1.7.12.rc2.18.g61b472e
Commit bf4d1b5ddb78f86078ac6ae0415802d5f0c68f92 brought the multiple driver
support. The code added a couple of new API to register the driver per cpu.
That led to some code complexity to handle the kernel config options when
the multiple driver support is enabled or not, which is not really necessary.
The code has to be compatible when the multiple driver support is not enabled,
and the multiple driver support has to be compatible with the old api.
This patch removes this API, which is not yet used by any driver but needed
for the HMP cpuidle drivers which will come soon, and replaces its usage
by a cpumask pointer in the cpuidle driver structure telling what cpus are
handled by the driver. That let the API cpuidle_[un]register_driver to be used
for the multipled driver support.
The current code, a bit poor in comments, has been commented and simplified.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
---
drivers/cpuidle/driver.c | 325 ++++++++++++++++++++++++++++------------------
include/linux/cpuidle.h | 21 +--
2 files changed, 212 insertions(+), 134 deletions(-)
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 8dfaaae..2db96b5 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -18,206 +18,267 @@
DEFINE_SPINLOCK(cpuidle_driver_lock);
-static void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu);
-static struct cpuidle_driver * __cpuidle_get_cpu_driver(int cpu);
+#ifdef CONFIG_CPU_IDLE_MULTIPLE_DRIVERS
-static void cpuidle_setup_broadcast_timer(void *arg)
+static DEFINE_PER_CPU(struct cpuidle_driver *, cpuidle_drivers);
+
+/**
+ * __cpuidle_get_cpu_driver: returns the cpuidle driver tied with the specified
+ * cpu.
+ *
+ * @cpu: an integer specifying the cpu number
+ *
+ * Returns a pointer to struct cpuidle_driver, NULL if no driver has been
+ * registered for this driver
+ */
+static struct cpuidle_driver *__cpuidle_get_cpu_driver(int cpu)
{
- int cpu = smp_processor_id();
- clockevents_notify((long)(arg), &cpu);
+ return per_cpu(cpuidle_drivers, cpu);
}
-static void __cpuidle_driver_init(struct cpuidle_driver *drv, int cpu)
+/**
+ * __cpuidle_set_driver: assign to the per cpu variable the driver pointer for
+ * each cpu the driver is assigned to with the cpumask.
+ *
+ * @drv: a pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
+ */
+static inline int __cpuidle_set_driver(struct cpuidle_driver *drv)
{
- int i;
+ int cpu;
- drv->refcnt = 0;
+ for_each_cpu(cpu, drv->cpumask) {
- for (i = drv->state_count - 1; i >= 0 ; i--) {
+ if (__cpuidle_get_cpu_driver(cpu))
+ return -EBUSY;
- if (!(drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP))
- continue;
-
- drv->bctimer = 1;
- on_each_cpu_mask(get_cpu_mask(cpu), cpuidle_setup_broadcast_timer,
- (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1);
- break;
+ per_cpu(cpuidle_drivers, cpu) = drv;
}
+
+ return 0;
}
-static int __cpuidle_register_driver(struct cpuidle_driver *drv, int cpu)
+/**
+ * __cpuidle_unset_driver: for each cpu the driver is handling, set the per cpu
+ * variable driver to NULL.
+ *
+ * @drv: a pointer to a struct cpuidle_driver
+ */
+static inline void __cpuidle_unset_driver(struct cpuidle_driver *drv)
{
- if (!drv || !drv->state_count)
- return -EINVAL;
-
- if (cpuidle_disabled())
- return -ENODEV;
-
- if (__cpuidle_get_cpu_driver(cpu))
- return -EBUSY;
+ int cpu;
- __cpuidle_driver_init(drv, cpu);
+ for_each_cpu(cpu, drv->cpumask) {
- __cpuidle_set_cpu_driver(drv, cpu);
+ if (drv != __cpuidle_get_cpu_driver(cpu))
+ continue;
- return 0;
+ per_cpu(cpuidle_drivers, cpu) = NULL;
+ }
}
-static void __cpuidle_unregister_driver(struct cpuidle_driver *drv, int cpu)
-{
- if (drv != __cpuidle_get_cpu_driver(cpu))
- return;
+#else
- if (!WARN_ON(drv->refcnt > 0))
- __cpuidle_set_cpu_driver(NULL, cpu);
+static struct cpuidle_driver *cpuidle_curr_driver;
- if (drv->bctimer) {
- drv->bctimer = 0;
- on_each_cpu_mask(get_cpu_mask(cpu), cpuidle_setup_broadcast_timer,
- (void *)CLOCK_EVT_NOTIFY_BROADCAST_OFF, 1);
- }
+/**
+ * __cpuidle_get_cpu_driver: returns the global cpuidle driver pointer.
+ *
+ * @cpu: an integer specifying the cpu number, this parameter is ignored
+ *
+ * Returns a pointer to a struct cpuidle_driver, NULL if no driver was
+ * previously registered
+ */
+static inline struct cpuidle_driver *__cpuidle_get_cpu_driver(int cpu)
+{
+ return cpuidle_curr_driver;
}
-#ifdef CONFIG_CPU_IDLE_MULTIPLE_DRIVERS
+/**
+ * __cpuidle_set_driver: assign the cpuidle driver pointer to the global cpuidle
+ * driver variable.
+ *
+ * @drv: a pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
+ */
+static inline int __cpuidle_set_driver(struct cpuidle_driver *drv)
+{
+ if (cpuidle_curr_driver)
+ return -EBUSY;
-static DEFINE_PER_CPU(struct cpuidle_driver *, cpuidle_drivers);
+ cpuidle_curr_driver = drv;
-static void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu)
-{
- per_cpu(cpuidle_drivers, cpu) = drv;
+ return 0;
}
-static struct cpuidle_driver *__cpuidle_get_cpu_driver(int cpu)
+/**
+ * __cpuidle_unset_driver: reset the global cpuidle driver variable if the
+ * cpuidle driver pointer match it.
+ *
+ * @drv: a pointer to a struct cpuidle_driver
+ */
+static inline void __cpuidle_unset_driver(struct cpuidle_driver *drv)
{
- return per_cpu(cpuidle_drivers, cpu);
+ if (drv == cpuidle_curr_driver)
+ cpuidle_curr_driver = NULL;
}
-static void __cpuidle_unregister_all_cpu_driver(struct cpuidle_driver *drv)
+#endif
+
+/**
+ * cpuidle_setup_broadcast_timer: set the broadcast timer notification for the
+ * current cpu. This function is called per cpu context invoked by a smp cross
+ * call. It is not supposed to be called directly.
+ *
+ * @arg: a void pointer, actually used to match the smp cross call api but used
+ * as a long with two values:
+ * - CLOCK_EVT_NOTIFY_BROADCAST_ON
+ * - CLOCK_EVT_NOTIFY_BROADCAST_OFF
+ */
+static void cpuidle_setup_broadcast_timer(void *arg)
{
- int cpu;
- for_each_present_cpu(cpu)
- __cpuidle_unregister_driver(drv, cpu);
+ int cpu = smp_processor_id();
+ clockevents_notify((long)(arg), &cpu);
}
-static int __cpuidle_register_all_cpu_driver(struct cpuidle_driver *drv)
+/**
+ * __cpuidle_driver_init: initialize the driver internal data.
+ *
+ * @drv: a valid pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
+ */
+static int __cpuidle_driver_init(struct cpuidle_driver *drv)
{
- int ret = 0;
- int i, cpu;
+ int i;
- for_each_present_cpu(cpu) {
- ret = __cpuidle_register_driver(drv, cpu);
- if (ret)
- break;
- }
+ drv->refcnt = 0;
- if (ret)
- for_each_present_cpu(i) {
- if (i == cpu)
- break;
- __cpuidle_unregister_driver(drv, i);
- }
+ /*
+ * we default here to all cpu possible because if the kernel
+ * boots with some cpus offline and then we online one of them
+ * the cpu notifier won't know which driver to assign
+ */
+ if (!drv->cpumask)
+ drv->cpumask = cpu_possible_mask;
+
+ /*
+ * we look for the timer stop flag in the different states,
+ * so know we have to setup the broadcast timer. The loop is
+ * in reverse order, because usually the deeper state has this
+ * flag set
+ */
+ for (i = drv->state_count - 1; i >= 0 ; i--) {
+ if (!(drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP))
+ continue;
- return ret;
+ drv->bctimer = 1;
+ break;
+ }
+
+ return 0;
}
-int cpuidle_register_cpu_driver(struct cpuidle_driver *drv, int cpu)
+/**
+ * __cpuidle_register_driver: do some sanity checks, initializes the driver,
+ * assign the driver to the global cpuidle driver variable(s) and setup the
+ * broadcast timer if the cpuidle driver has some states which shutdown the
+ * local timer.
+ *
+ * @drv: a valid pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
+ */
+static int __cpuidle_register_driver(struct cpuidle_driver *drv)
{
int ret;
- spin_lock(&cpuidle_driver_lock);
- ret = __cpuidle_register_driver(drv, cpu);
- spin_unlock(&cpuidle_driver_lock);
+ if (!drv || !drv->state_count)
+ return -EINVAL;
- return ret;
-}
+ if (cpuidle_disabled())
+ return -ENODEV;
-void cpuidle_unregister_cpu_driver(struct cpuidle_driver *drv, int cpu)
-{
- spin_lock(&cpuidle_driver_lock);
- __cpuidle_unregister_driver(drv, cpu);
- spin_unlock(&cpuidle_driver_lock);
-}
+ ret = __cpuidle_driver_init(drv);
+ if (ret)
+ return ret;
-/**
- * cpuidle_register_driver - registers a driver
- * @drv: the driver
- */
-int cpuidle_register_driver(struct cpuidle_driver *drv)
-{
- int ret;
+ ret = __cpuidle_set_driver(drv);
+ if (ret)
+ return ret;
- spin_lock(&cpuidle_driver_lock);
- ret = __cpuidle_register_all_cpu_driver(drv);
- spin_unlock(&cpuidle_driver_lock);
+ if (drv->bctimer)
+ on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+ (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1);
- return ret;
+ return 0;
}
-EXPORT_SYMBOL_GPL(cpuidle_register_driver);
/**
- * cpuidle_unregister_driver - unregisters a driver
- * @drv: the driver
+ * __cpuidle_unregister_driver: checks the driver is no longer in use, reset the
+ * global cpuidle driver variable(s) and disable the timer broadcast
+ * notification mechanism if it was in use.
+ *
+ * @drv: a valid pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
*/
-void cpuidle_unregister_driver(struct cpuidle_driver *drv)
+static void __cpuidle_unregister_driver(struct cpuidle_driver *drv)
{
- spin_lock(&cpuidle_driver_lock);
- __cpuidle_unregister_all_cpu_driver(drv);
- spin_unlock(&cpuidle_driver_lock);
-}
-EXPORT_SYMBOL_GPL(cpuidle_unregister_driver);
-
-#else
-
-static struct cpuidle_driver *cpuidle_curr_driver;
+ if (!WARN_ON(drv->refcnt > 0))
+ return;
-static inline void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu)
-{
- cpuidle_curr_driver = drv;
-}
+ __cpuidle_unset_driver(drv);
-static inline struct cpuidle_driver *__cpuidle_get_cpu_driver(int cpu)
-{
- return cpuidle_curr_driver;
+ if (drv->bctimer) {
+ drv->bctimer = 0;
+ on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+ (void *)CLOCK_EVT_NOTIFY_BROADCAST_OFF, 1);
+ }
}
/**
- * cpuidle_register_driver - registers a driver
- * @drv: the driver
+ * cpuidle_register_driver: registers a driver by taking a lock to prevent
+ * multiple callers to [un]register a driver at the same time.
+ *
+ * @drv: a pointer to a valid struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
*/
int cpuidle_register_driver(struct cpuidle_driver *drv)
{
- int ret, cpu;
+ int ret;
- cpu = get_cpu();
spin_lock(&cpuidle_driver_lock);
- ret = __cpuidle_register_driver(drv, cpu);
+ ret = __cpuidle_register_driver(drv);
spin_unlock(&cpuidle_driver_lock);
- put_cpu();
return ret;
}
EXPORT_SYMBOL_GPL(cpuidle_register_driver);
/**
- * cpuidle_unregister_driver - unregisters a driver
- * @drv: the driver
+ * cpuidle_unregister_driver: unregisters a driver by taking a lock to prevent
+ * multiple callers to [un]register a driver at the same time. The specified
+ * driver must match the driver currently registered.
+ *
+ * @drv: a pointer to a valid struct cpuidle_driver
*/
void cpuidle_unregister_driver(struct cpuidle_driver *drv)
{
- int cpu;
-
- cpu = get_cpu();
spin_lock(&cpuidle_driver_lock);
- __cpuidle_unregister_driver(drv, cpu);
+ __cpuidle_unregister_driver(drv);
spin_unlock(&cpuidle_driver_lock);
- put_cpu();
}
EXPORT_SYMBOL_GPL(cpuidle_unregister_driver);
-#endif
/**
- * cpuidle_get_driver - return the current driver
+ * cpuidle_get_driver: returns the driver tied with the current cpu.
+ *
+ * Returns a struct cpuidle_driver pointer, or NULL if no driver is registered
*/
struct cpuidle_driver *cpuidle_get_driver(void)
{
@@ -233,7 +294,12 @@ struct cpuidle_driver *cpuidle_get_driver(void)
EXPORT_SYMBOL_GPL(cpuidle_get_driver);
/**
- * cpuidle_get_cpu_driver - return the driver tied with a cpu
+ * cpuidle_get_cpu_driver: returns the driver registered with a cpu.
+ *
+ * @dev: a valid pointer to a struct cpuidle_device
+ *
+ * Returns a struct cpuidle_driver pointer, or NULL if no driver is registered
+ * for the specified cpu
*/
struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev)
{
@@ -244,6 +310,13 @@ struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev)
}
EXPORT_SYMBOL_GPL(cpuidle_get_cpu_driver);
+/**
+ * cpuidle_driver_ref: gets a refcount for the driver. Note this function takes
+ * a refcount for the driver assigned to the current cpu.
+ *
+ * Returns a struct cpuidle_driver pointer, or NULL if no driver is registered
+ * for the current cpu
+ */
struct cpuidle_driver *cpuidle_driver_ref(void)
{
struct cpuidle_driver *drv;
@@ -257,6 +330,10 @@ struct cpuidle_driver *cpuidle_driver_ref(void)
return drv;
}
+/**
+ * cpuidle_driver_unref: puts down the refcount for the driver. Note this
+ * function decrement the refcount for the driver assigned to the current cpu.
+ */
void cpuidle_driver_unref(void)
{
struct cpuidle_driver *drv = cpuidle_get_driver();
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 3c86faa..e7a94db 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -101,16 +101,20 @@ static inline int cpuidle_get_last_residency(struct cpuidle_device *dev)
****************************/
struct cpuidle_driver {
- const char *name;
- struct module *owner;
- int refcnt;
+ const char *name;
+ struct module *owner;
+ int refcnt;
/* used by the cpuidle framework to setup the broadcast timer */
- unsigned int bctimer:1;
+ unsigned int bctimer:1;
+
/* states array must be ordered in decreasing power consumption */
- struct cpuidle_state states[CPUIDLE_STATE_MAX];
- int state_count;
- int safe_state_index;
+ struct cpuidle_state states[CPUIDLE_STATE_MAX];
+ int state_count;
+ int safe_state_index;
+
+ /* the driver handles the cpus in cpumask */
+ const struct cpumask *cpumask;
};
#ifdef CONFIG_CPU_IDLE
@@ -135,9 +139,6 @@ extern void cpuidle_disable_device(struct cpuidle_device *dev);
extern int cpuidle_play_dead(void);
extern struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev);
-extern int cpuidle_register_cpu_driver(struct cpuidle_driver *drv, int cpu);
-extern void cpuidle_unregister_cpu_driver(struct cpuidle_driver *drv, int cpu);
-
#else
static inline void disable_cpuidle(void) { }
static inline int cpuidle_idle_call(void) { return -ENODEV; }
--
1.7.9.5
Hi,
This patchset takes advantage of the new per-task load tracking that is
available in the kernel for packing the small tasks in as few as possible
CPU/Cluster/Core. The main goal of packing small tasks is to reduce the power
consumption in the low load use cases by minimizing the number of power domain
that are enabled. The packing is done in 2 steps:
The 1st step looks for the best place to pack tasks in a system according to
its topology and it defines a pack buddy CPU for each CPU if there is one
available. We define the best CPU during the build of the sched_domain instead
of evaluating it at runtime because it can be difficult to define a stable
buddy CPU in a low CPU load situation. The policy for defining a buddy CPU is
that we pack at all levels inside a node where a group of CPU can be power
gated independently from others. For describing this capability, a new flag
has been introduced SD_SHARE_POWERDOMAIN that is used to indicate whether the
groups of CPUs of a scheduling domain are sharing their power state. By
default, this flag has been set in all sched_domain in order to keep unchanged
the current behavior of the scheduler and only ARM platform clears the
SD_SHARE_POWERDOMAIN flag for MC and CPU level.
In a 2nd step, the scheduler checks the load average of a task which wakes up
as well as the load average of the buddy CPU and it can decide to migrate the
light tasks on a not busy buddy. This check is done during the wake up because
small tasks tend to wake up between periodic load balance and asynchronously
to each other which prevents the default mechanism to catch and migrate them
efficiently. A light task is defined by a runnable_avg_sum that is less than
20% of the runnable_avg_period. In fact, the former condition encloses 2 ones:
The average CPU load of the task must be less than 20% and the task must have
been runnable less than 10ms when it woke up last time in order to be
electable for the packing migration. So, a task than runs 1 ms each 5ms will
be considered as a small task but a task that runs 50 ms with a period of
500ms, will not.
Then, the business of the buddy CPU depends of the load average for the rq and
the number of running tasks. A CPU with a load average greater than 50% will
be considered as busy CPU whatever the number of running tasks is and this
threshold will be reduced by the number of running tasks in order to not
increase too much the wake up latency of a task. When the buddy CPU is busy,
the scheduler falls back to default CFS policy.
Change since V2:
- Migrate only a task that wakes up
- Change the light tasks threshold to 20%
- Change the loaded CPU threshold to not pull tasks if the current number of
running tasks is null but the load average is already greater than 50%
- Fix the algorithm for selecting the buddy CPU.
Change since V1:
Patch 2/6
- Change the flag name which was not clear. The new name is
SD_SHARE_POWERDOMAIN.
- Create an architecture dependent function to tune the sched_domain flags
Patch 3/6
- Fix issues in the algorithm that looks for the best buddy CPU
- Use pr_debug instead of pr_info
- Fix for uniprocessor
Patch 4/6
- Remove the use of usage_avg_sum which has not been merged
Patch 5/6
- Change the way the coherency of runnable_avg_sum and runnable_avg_period is
ensured
Patch 6/6
- Use the arch dependent function to set/clear SD_SHARE_POWERDOMAIN for ARM
platform
New results for v3:
This series has been tested with hackbench on ARM platform and the results
don't show any performance regression
Hackbench 3.9-rc2 +patches
Mean Time (10 tests): 2.048 2.015
stdev : 0.047 0.068
Previous results for V2:
This series has been tested with MP3 play back on ARM platform:
TC2 HMP (dual CA-15 and 3xCA-7 cluster).
The measurements have been done on an Ubuntu image during 60 seconds of
playback and the result has been normalized to 100.
| CA15 | CA7 | total |
-------------------------------------
default | 81 | 97 | 178 |
pack | 13 | 100 | 113 |
-------------------------------------
Previous results for V1:
The patch-set has been tested on ARM platforms: quad CA-9 SMP and TC2 HMP
(dual CA-15 and 3xCA-7 cluster). For ARM platform, the results have
demonstrated that it's worth packing small tasks at all topology levels.
The performance tests have been done on both platforms with sysbench. The
results don't show any performance regressions. These results are aligned with
the policy which uses the normal behavior with heavy use cases.
test: sysbench --test=cpu --num-threads=N --max-requests=R run
Results below is the average duration of 3 tests on the quad CA-9.
default is the current scheduler behavior (pack buddy CPU is -1)
pack is the scheduler with the pack mechanism
| default | pack |
-----------------------------------
N=8; R=200 | 3.1999 | 3.1921 |
N=8; R=2000 | 31.4939 | 31.4844 |
N=12; R=200 | 3.2043 | 3.2084 |
N=12; R=2000 | 31.4897 | 31.4831 |
N=16; R=200 | 3.1774 | 3.1824 |
N=16; R=2000 | 31.4899 | 31.4897 |
-----------------------------------
The power consumption tests have been done only on TC2 platform which has got
accessible power lines and I have used cyclictest to simulate small tasks. The
tests show some power consumption improvements.
test: cyclictest -t 8 -q -e 1000000 -D 20 & cyclictest -t 8 -q -e 1000000 -D 20
The measurements have been done during 16 seconds and the result has been
normalized to 100
| CA15 | CA7 | total |
-------------------------------------
default | 100 | 40 | 140 |
pack | <1 | 45 | <46 |
-------------------------------------
The A15 cluster is less power efficient than the A7 cluster but if we assume
that the tasks is well spread on both clusters, we can guest estimate that the
power consumption on a dual cluster of CA7 would have been for a default
kernel:
| CA7 | CA7 | total |
-------------------------------------
default | 40 | 40 | 80 |
-------------------------------------
Vincent Guittot (6):
Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for
load-tracking"
sched: add a new SD_SHARE_POWERDOMAIN flag for sched_domain
sched: pack small tasks
sched: secure access to other CPU statistics
sched: pack the idle load balance
ARM: sched: clear SD_SHARE_POWERDOMAIN
arch/arm/kernel/topology.c | 9 +++
arch/ia64/include/asm/topology.h | 1 +
arch/tile/include/asm/topology.h | 1 +
include/linux/sched.h | 9 +--
include/linux/topology.h | 4 +
kernel/sched/core.c | 14 ++--
kernel/sched/fair.c | 149 +++++++++++++++++++++++++++++++++++---
kernel/sched/sched.h | 14 ++--
8 files changed, 169 insertions(+), 32 deletions(-)
--
1.7.9.5
Currently the cpuidle drivers are spread across the different archs.
The patch submission for cpuidle are following different path: the cpuidle core
code goes to linux-pm, the ARM drivers goes to arm-soc or the SoC specific
tree, sh goes through sh arch tree, pseries goes through PowerPC and
finally intel goes through Len's tree while acpi_idle goes under linux-pm.
That makes difficult to consolidate the code and to propagate modifications
from the cpuidle core to the different drivers.
Hopefully, a movement has initiated to put the cpuidle drivers into the
drivers/cpuidle directory like cpuidle-calxeda.c and cpuidle-kirkwood.c
Add an explicit maintainer entry in the MAINTAINER to clarify the situation
and prevent new cpuidle drivers to goes to an arch directory.
The upstreaming process is unchanged: Rafael takes the patches to merge them
into its tree but with the acked-by from the driver's maintainer. So the header
must contains the name of the maintainer.
This organization will be the same than cpufreq.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
---
MAINTAINERS | 7 +++++++
drivers/cpuidle/cpuidle-calxeda.c | 4 +++-
drivers/cpuidle/cpuidle-kirkwood.c | 5 +++--
3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 61677c3..effa0f3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2206,6 +2206,13 @@ S: Maintained
F: drivers/cpufreq/
F: include/linux/cpufreq.h
+CPUIDLE DRIVERS
+M: Rafael J. Wysocki <rjw(a)sisk.pl>
+L: linux-pm(a)vger.kernel.org
+S: Maintained
+F: drivers/cpuidle/*
+F: include/linux/cpuidle.h
+
CPU FREQUENCY DRIVERS - ARM BIG LITTLE
M: Viresh Kumar <viresh.kumar(a)linaro.org>
M: Sudeep KarkadaNagesha <sudeep.karkadanagesha(a)arm.com>
diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
index e344b56..2378c39 100644
--- a/drivers/cpuidle/cpuidle-calxeda.c
+++ b/drivers/cpuidle/cpuidle-calxeda.c
@@ -1,7 +1,6 @@
/*
* Copyright 2012 Calxeda, Inc.
*
- * Based on arch/arm/plat-mxc/cpuidle.c:
* Copyright 2012 Freescale Semiconductor, Inc.
* Copyright 2012 Linaro Ltd.
*
@@ -16,6 +15,9 @@
*
* You should have received a copy of the GNU General Public License along with
* this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author : Rob Herring <rob.herring(a)calxeda.com>
+ * Maintainer: Rob Herring <rob.herring(a)calxeda.com>
*/
#include <linux/cpuidle.h>
diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
index 53290e1..521b0a7 100644
--- a/drivers/cpuidle/cpuidle-kirkwood.c
+++ b/drivers/cpuidle/cpuidle-kirkwood.c
@@ -1,6 +1,4 @@
/*
- * arch/arm/mach-kirkwood/cpuidle.c
- *
* CPU idle Marvell Kirkwood SoCs
*
* This file is licensed under the terms of the GNU General Public
@@ -11,6 +9,9 @@
* to implement two idle states -
* #1 wait-for-interrupt
* #2 wait-for-interrupt and DDR self refresh
+ *
+ * Maintainer: Jason Cooper <jason(a)lakedaemon.net>
+ * Maintainer: Andrew Lunn <andrew(a)lunn.ch>
*/
#include <linux/kernel.h>
--
1.7.9.5
On 25 April 2013 08:16, Tang Yuantian-B29983 <B29983(a)freescale.com> wrote:
> It happened when policy->cpus contains *MORE THEN ONE CPU*.
> Taking my board T4240 for example, it has 3 cluster, 8 CPUs for each cluster.
> The log is:
> # insmod ppc-corenet-cpufreq.ko
> ppc_corenet_cpufreq: Freescale PowerPC corenet CPU frequency scaling driver
> # rmmod ppc-corenet-cpufreq.ko
> ERROR: Module ppc_corenet_cpufreq is in use
> # lsmod
> Module Size Used by
> ppc_corenet_cpufreq 6542 9
> # uname -a
> Linux T4240 3.9.0-rc1-11081-g34642bb-dirty #44 SMP Thu Apr 25 08:58:26 CST 2013 ppc64 unknown
>
> I am not using the newest kernel (since new t4240 board has not included yet),
> but the issue is still there.
> The reason is just like what I said in patch.
I believed what you said is correct and went on testing this on my platform.
2 clusters with 2 and 3 cpus... And so i have multiple cpus per
cluster or policy
structure.
insmod/rmmod worked as expected without any issues.
So, for me there are no such issues. BTW, i tested this on latest rc from Linus
and also on latest code from linux-next.
I am sure the counts are very well balanced and there are no issues in the
latest code Atleast.
On my smp platform which is made of 5 cores in 2 clusters, I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause is:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
More generally, the NOHZ_IDLE flag must be initialized when new sched_domains
are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned.
This condition can be ensured by adding a synchronize_rcu between the
destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE
flag will not be updated with old sched_domain once it has been initialized.
But this solution introduces a additionnal latency in the rebuild sequence
that is called during cpu hotplug.
As suggested by Frederic Weisbecker, another solution is to have the same
rcu lifecycle for both NOHZ_IDLE and sched_domain struct.
A new nohz_idle field is added to sched_domain so both status and sched_domain
will share the same RCU lifecycle and will be always synchronized.
In addition, there is no more need to protect nohz_idle against concurrent
access as it is only modified by 2 exclusive functions called by local cpu.
This solution has been prefered to the creation of a new struct with an extra
pointer indirection for sched_domain.
The synchronization is done at the cost of :
- An additional indirection and a rcu_dereference for accessing nohz_idle.
- We use only the nohz_idle field of the top sched_domain.
Change since v7:
- remove atomic access which is useless now.
- refactor the sequence that update nohz_idle status and nr_busy_cpus.
Change since v6:
- Add the flags in struct sched_domain instead of creating a sched_domain_rq.
Change since v5:
- minor variable and function name change.
- remove a useless null check before kfree
- fix a compilation error when NO_HZ is not set.
Change since v4:
- link both sched_domain and NOHZ_IDLE flag in one RCU object so
their states are always synchronized.
Change since V3;
- NOHZ flag is not cleared if a NULL domain is attached to the CPU
- Remove patch 2/2 which becomes useless with latest modifications
Change since V2:
- change the initialization to idle state instead of busy state so a CPU that
enters idle during the build of the sched_domain will not corrupt the
initialization state
Change since V1:
- remove the patch for SCHED softirq on an idle core use case as it was
a side effect of the other use cases.
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
include/linux/sched.h | 3 +++
kernel/sched/fair.c | 26 ++++++++++++++++----------
kernel/sched/sched.h | 1 -
3 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..22bcbe8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -899,6 +899,9 @@ struct sched_domain {
unsigned int wake_idx;
unsigned int forkexec_idx;
unsigned int smt_gain;
+#ifdef CONFIG_NO_HZ
+ int nohz_idle; /* NOHZ IDLE status */
+#endif
int flags; /* See SD_* */
int level;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..5db1817 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5395,13 +5395,16 @@ static inline void set_cpu_sd_state_busy(void)
struct sched_domain *sd;
int cpu = smp_processor_id();
- if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
- return;
- clear_bit(NOHZ_IDLE, nohz_flags(cpu));
-
rcu_read_lock();
- for_each_domain(cpu, sd)
+ sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
+
+ if (!sd || !sd->nohz_idle)
+ goto unlock;
+ sd->nohz_idle = 0;
+
+ for (; sd; sd = sd->parent)
atomic_inc(&sd->groups->sgp->nr_busy_cpus);
+unlock:
rcu_read_unlock();
}
@@ -5410,13 +5413,16 @@ void set_cpu_sd_state_idle(void)
struct sched_domain *sd;
int cpu = smp_processor_id();
- if (test_bit(NOHZ_IDLE, nohz_flags(cpu)))
- return;
- set_bit(NOHZ_IDLE, nohz_flags(cpu));
-
rcu_read_lock();
- for_each_domain(cpu, sd)
+ sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
+
+ if (!sd || sd->nohz_idle)
+ goto unlock;
+ sd->nohz_idle = 1;
+
+ for (; sd; sd = sd->parent)
atomic_dec(&sd->groups->sgp->nr_busy_cpus);
+unlock:
rcu_read_unlock();
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cc03cfd..03b13c8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1187,7 +1187,6 @@ extern void account_cfs_bandwidth_used(int enabled, int was_enabled);
enum rq_nohz_flag_bits {
NOHZ_TICK_STOPPED,
NOHZ_BALANCE_KICK,
- NOHZ_IDLE,
};
#define nohz_flags(cpu) (&cpu_rq(cpu)->nohz_flags)
--
1.7.9.5
Hi,
Working as a newbie in the PMWG, I noticed I'm not able to resume my
pandaboard-es with the latest 3.9 kernel from Linus (configuration file
omap2plus_defconfig). Suspend/resume appears to work with the Linaro 12.11
release; I managed to wake it up with a USB keyboard. There is also
launchpad bug 989547 that is still open. Any updates on this issue?
Thanks,
Zoran
This patchset was called: "Create sched_select_cpu() and use it for workqueues"
for the first three versions.
Earlier discussions over v3, v2 and v1 can be found here:
https://lkml.org/lkml/2013/3/18/364http://lists.linaro.org/pipermail/linaro-dev/2012-November/014344.htmlhttp://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
For power saving it is better to schedule work on cpus that aren't idle, as
bringing a cpu/cluster from idle state can be very costly (both performance and
power wise). Earlier we tried to use timer infrastructure to take this decision
but we found out later that scheduler gives even better results and so we should
use scheduler for choosing cpu for scheduling work.
In workqueue subsystem workqueues with flag WQ_UNBOUND are the ones which uses
cpu to select target cpu.
Here we are migrating few users of workqueues to WQ_UNBOUND. These drivers are
found to be very much active on idle or lightly busy system and using WQ_UNBOUND
for these gave impressive results.
Setup:
-----
- ARM Vexpress TC2 - big.LITTLE CPU
- Core 0-1: A15, 2-4: A7
- rootfs: linaro-ubuntu-devel
This patchset has been tested on a big LITTLE system (heterogeneous) but is
useful for all other homogeneous systems as well. During these tests audio was
played in background using aplay.
Results:
-------
Cluster A15 Energy Cluster A7 Energy Total
------------------------- ----------------------- ------
Without this patchset (Energy in Joules):
---------------------------------------------------
0.151162 2.183545 2.334707
0.223730 2.687067 2.910797
0.289687 2.732702 3.022389
0.454198 2.745908 3.200106
0.495552 2.746465 3.242017
Average:
0.322866 2.619137 2.942003
With this patchset (Energy in Joules):
-----------------------------------------------
0.226421 2.283658 2.510079
0.151361 2.236656 2.388017
0.197726 2.249849 2.447575
0.221915 2.229446 2.451361
0.347098 2.257707 2.604805
Average:
0.2289042 2.2514632 2.4803674
Above tests are repeated multiple times and events are tracked using trace-cmd
and analysed using kernelshark. And it was easily noticeable that idle time for
many cpus has increased considerably, which eventually saved some power.
PS: All the earlier Acks we got for drivers are reverted here as patches have
been updated significantly.
V3->V4:
-------
- Dropped changes to kernel/sched directory and hence
sched_select_non_idle_cpu().
- Dropped queue_work_on_any_cpu()
- Created system_freezable_unbound_wq
- Changed all patches accordingly.
V2->V3:
-------
- Dropped changes into core queue_work() API, rather create *_on_any_cpu()
APIs
- Dropped running timers migration patch as that was broken
- Migrated few users of workqueues to use *_on_any_cpu() APIs.
Viresh Kumar (4):
workqueue: Add system wide system_freezable_unbound_wq
PHYLIB: queue work on unbound wq
block: queue work on unbound wq
fbcon: queue work on unbound wq
block/blk-core.c | 3 ++-
block/blk-ioc.c | 2 +-
block/genhd.c | 10 ++++++----
drivers/net/phy/phy.c | 9 +++++----
drivers/video/console/fbcon.c | 2 +-
include/linux/workqueue.h | 4 ++++
kernel/workqueue.c | 7 ++++++-
7 files changed, 25 insertions(+), 12 deletions(-)
--
1.7.12.rc2.18.g61b472e
This patchset series provide some code consolidation across the different
cpuidle drivers. It contains two parts, the first one is the removal of
the time keeping flag and the second one, is a common initialization routine.
All the drivers use the en_core_tk_irqen flag, which means it is not necessary
to make the time computation optional. We can remove this flag and assume the
cpuidle framework always manage this operation.
The cpuidle code initialization is duplicated across the different drivers in
the same manner.
The repeating pattern is:
SMP:
cpuidle_register_driver(drv);
for_each_possible_cpu(cpu) {
dev = per_cpu(cpuidle_device, cpu);
cpuidle_register_device(dev);
}
UP:
cpuidle_register_driver(drv);
cpuidle_register_device(dev);
As on a UP machine the macro 'for_each_cpu' is a one iteration loop, using the
initialization loop from SMP to UP works.
The patchset does some cleanup for different drivers in order to make the init
code the same. Then it introduces a generic function:
cpuidle_register(struct cpuidle_driver *drv, struct cpumask *cpumask)
The cpumask is for the coupled idle states.
The drivers are then modified to take into account this new function and
to remove the duplicated code.
The benefit is observable in the diffstat: 332 lines of code removed.
Changelog:
- V3:
* folded patch 5/19 into 19/19, they were:
* ARM: imx: cpuidle: use init/exit common routine
* ARM: imx: cpuidle: create separate drivers for imx5/imx6
* removed rule to make cpuidle.o in the imx's Makefile
* splitted patch 1/19 into two, they are:
* [V3 patch 01/19] ARM: shmobile: cpuidle: remove shmobile_enter_wfi
* [V3 patch 02/19] ARM: shmobile: cpuidle: remove shmobile_enter_wfi prototype
- V2:
* fixed cpumask NULL test for coupled state in cpuidle_register
* added comment about structure copy
* changed printk by pr_err
* folded splitted message
* fixed return code in cpuidle_register
* updated Documentation/cpuidle/drivers.txt
* added in the changelog dev->state_count is filled by cpuidle_enable_device
* fixed tag for tegra in the first line patch description
* fixed tegra2 removed tegra_tear_down_cpu = tegra20_tear_down_cpu;
- V1: Initial post
Tested-on: u8500
Tested-on: at91
Tested-on: intel i5
Tested-on: OMAP4
Compiled with and without CPU_IDLE for:
u8500, at91, davinci, exynos, imx5, imx6, kirkwood, multi_v7 (for calxeda),
omap2plus, s3c64, tegra1, tegra2, tegra3
Daniel Lezcano (19):
ARM: shmobile: cpuidle: remove shmobile_enter_wfi function
ARM: shmobile: cpuidle: remove shmobile_enter_wfi prototype
ARM: OMAP3: remove cpuidle_wrap_enter
cpuidle: remove en_core_tk_irqen flag
ARM: ux500: cpuidle: replace for_each_online_cpu by
for_each_possible_cpu
cpuidle: make a single register function for all
ARM: ux500: cpuidle: use init/exit common routine
ARM: at91: cpuidle: use init/exit common routine
ARM: OMAP3: cpuidle: use init/exit common routine
ARM: s3c64xx: cpuidle: use init/exit common routine
ARM: tegra: cpuidle: use init/exit common routine
ARM: shmobile: cpuidle: use init/exit common routine
ARM: OMAP4: cpuidle: use init/exit common routine
ARM: tegra: cpuidle: use init/exit common routine for tegra2
ARM: tegra: cpuidle: use init/exit common routine for tegra3
ARM: calxeda: cpuidle: use init/exit common routine
ARM: kirkwood: cpuidle: use init/exit common routine
ARM: davinci: cpuidle: use init/exit common routine
ARM: imx: cpuidle: use init/exit common routine
Documentation/cpuidle/driver.txt | 6 +
arch/arm/mach-at91/cpuidle.c | 18 +--
arch/arm/mach-davinci/cpuidle.c | 21 +---
arch/arm/mach-exynos/cpuidle.c | 1 -
arch/arm/mach-imx/Makefile | 2 +-
arch/arm/mach-imx/cpuidle-imx5.c | 40 +++++++
arch/arm/mach-imx/cpuidle-imx6q.c | 3 +-
arch/arm/mach-imx/cpuidle.c | 80 -------------
arch/arm/mach-imx/cpuidle.h | 10 +-
arch/arm/mach-imx/pm-imx5.c | 30 +----
arch/arm/mach-omap2/cpuidle34xx.c | 49 ++------
arch/arm/mach-omap2/cpuidle44xx.c | 23 +---
arch/arm/mach-s3c64xx/cpuidle.c | 15 +--
arch/arm/mach-shmobile/cpuidle.c | 11 +-
arch/arm/mach-shmobile/include/mach/common.h | 3 -
arch/arm/mach-shmobile/pm-sh7372.c | 2 -
arch/arm/mach-tegra/cpuidle-tegra114.c | 27 +----
arch/arm/mach-tegra/cpuidle-tegra20.c | 31 +----
arch/arm/mach-tegra/cpuidle-tegra30.c | 28 +----
arch/arm/mach-ux500/cpuidle.c | 33 +-----
arch/powerpc/platforms/pseries/processor_idle.c | 1 -
arch/sh/kernel/cpu/shmobile/cpuidle.c | 1 -
arch/x86/kernel/apm_32.c | 1 -
drivers/acpi/processor_idle.c | 1 -
drivers/cpuidle/cpuidle-calxeda.c | 53 +--------
drivers/cpuidle/cpuidle-kirkwood.c | 18 +--
drivers/cpuidle/cpuidle.c | 144 ++++++++++++++---------
drivers/idle/intel_idle.c | 1 -
include/linux/cpuidle.h | 20 ++--
29 files changed, 175 insertions(+), 498 deletions(-)
create mode 100644 arch/arm/mach-imx/cpuidle-imx5.c
delete mode 100644 arch/arm/mach-imx/cpuidle.c
--
1.7.9.5
While migrating to common clock framework (CCF), I found that the FIMD clocks
were pulled down by the CCF.
If CCF finds any clock(s) which has NOT been claimed by any of the
drivers, then such clock(s) are PULLed low by CCF.
Calling clk_prepare() for FIMD clocks fixes the issue.
This patch also replaces clk_disable() with clk_unprepare() during exit, since
clk_prepare() is called in fimd_probe().
Signed-off-by: Vikas Sajjan <vikas.sajjan(a)linaro.org>
---
Changes since v3:
- added clk_prepare() in fimd_probe() and clk_unprepare() in fimd_remove()
as suggested by Viresh Kumar <viresh.kumar(a)linaro.org>
Changes since v2:
- moved clk_prepare_enable() and clk_disable_unprepare() from
fimd_probe() to fimd_clock() as suggested by Inki Dae <inki.dae(a)samsung.com>
Changes since v1:
- added error checking for clk_prepare_enable() and also replaced
clk_disable() with clk_disable_unprepare() during exit.
---
drivers/gpu/drm/exynos/exynos_drm_fimd.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
index 9537761..aa22370 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
@@ -934,6 +934,16 @@ static int fimd_probe(struct platform_device *pdev)
return ret;
}
+ ret = clk_prepare(ctx->bus_clk);
+ if (ret < 0)
+ return ret;
+
+ ret = clk_prepare(ctx->lcd_clk);
+ if (ret < 0) {
+ clk_unprepare(ctx->bus_clk);
+ return ret;
+ }
+
ctx->vidcon0 = pdata->vidcon0;
ctx->vidcon1 = pdata->vidcon1;
ctx->default_win = pdata->default_win;
@@ -981,8 +991,8 @@ static int fimd_remove(struct platform_device *pdev)
if (ctx->suspended)
goto out;
- clk_disable(ctx->lcd_clk);
- clk_disable(ctx->bus_clk);
+ clk_unprepare(ctx->lcd_clk);
+ clk_unprepare(ctx->bus_clk);
pm_runtime_set_suspended(dev);
pm_runtime_put_sync(dev);
--
1.7.9.5
hi Nico & all,
We are do some profiling on TC2 board for low power mode, and found
there have some long latency for the core/cluster's power on sequence,
so want to confirm below questions:
1. From our profiling result, we found if the core_A send IPI to core_B
and the core_B run into the function bL_entry_point (or the function
mcpm_entry_point in your later patches for mainline) will take about
954us, it's really a long interval.
Now we use the firmware is 13.01's version (with has supported BX_ADDRx
registers); so the cluster level's power on sequence should be:
a) DCC to detect the nIRQOUT/nFIQOUT asserting;
b) DCC power on the according cluster;
c) the core run into boot monitor code and finally it will use the
BX_ADDRx register to jump to the function *bL_entry_point*;
Due upper flows are black box for us, so we suspect the time will be
consumed by one of these steps; could u or ARM guys can help confirm
this question?
2. When we read the spec DAI0318D_v2p_ca15_a7_power_management.pdf and
get confirm from ARM support, we know there only have cluster level's
power down with CA15_PWRDN_EN/CA7_PWRDN_EN bits.
For the core level, we can NOT independently to power off the core if
other cores in the same cluster are still powered on. But this is
conflicting with TC2's power management code in tc2_pm.c.
We can see in the function *tc2_pm_down()*, it will call
gic_cpu_if_down() to disable GIC's cpu interface; that means the core
cannot receive interrupts anymore and the core will run into WFI. After
the core run into WFI, if DCC/SPC detect there have interrupts from
GIC's nIRQOUT/nFIQOUT pins, then the DCC/SPC will power on the core (or
reset the core) to let the core to resume back, then s/w need enable the
GIC's cpu interface for itself.
Here the questions are:
a) in the function *tc2_pm_down()*, after the core run into WFI state,
though DCC/SPC cannot power off the core if the core is NOT the last man
of the cluster, but DCC/SPC will reset the core, right?
b) how DCC/SPC decide the core is want to run into C1 state or only
"WFI" state? DCC/SPC will use the WAKE_INT_MASK bits as the flag?
--
Thx,
Leo Yan
On my smp platform which is made of 5 cores in 2 clusters, I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause is:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
More generally, the NOHZ_IDLE flag must be initialized when new sched_domains
are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned.
This condition can be ensured by adding a synchronize_rcu between the
destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE
flag will not be updated with old sched_domain once it has been initialized.
But this solution introduces a additionnal latency in the rebuild sequence
that is called during cpu hotplug.
As suggested by Frederic Weisbecker, another solution is to have the same
rcu lifecycle for both NOHZ_IDLE and sched_domain struct.
A new nohz_flags has been added to sched_domain so both flags and sched_domain
will share the same RCU lifecycle and will be always synchronized. This
solution is prefered to the creation of a new struct with an extra pointer
indirection.
The synchronization is done at the cost of :
- An additional indirection and a rcu_dereference for accessing the NOHZ_IDLE
flag.
- We use only the nohz_flags field of the top sched_domain.
Change since v6:
- Add the flags in struct sched_domain instead of creating a sched_domain_rq.
Change since v5:
- minor variable and function name change.
- remove a useless null check before kfree
- fix a compilation error when NO_HZ is not set.
Change since v4:
- link both sched_domain and NOHZ_IDLE flag in one RCU object so
their states are always synchronized.
Change since V3;
- NOHZ flag is not cleared if a NULL domain is attached to the CPU
- Remove patch 2/2 which becomes useless with latest modifications
Change since V2:
- change the initialization to idle state instead of busy state so a CPU that
enters idle during the build of the sched_domain will not corrupt the
initialization state
Change since V1:
- remove the patch for SCHED softirq on an idle core use case as it was
a side effect of the other use cases.
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
include/linux/sched.h | 1 +
kernel/sched/fair.c | 34 ++++++++++++++++++++++++----------
2 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..cde4f7f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -899,6 +899,7 @@ struct sched_domain {
unsigned int wake_idx;
unsigned int forkexec_idx;
unsigned int smt_gain;
+ unsigned long nohz_flags; /* NOHZ_IDLE flag status */
int flags; /* See SD_* */
int level;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..09e440f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5394,14 +5394,21 @@ static inline void set_cpu_sd_state_busy(void)
{
struct sched_domain *sd;
int cpu = smp_processor_id();
-
- if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
- return;
- clear_bit(NOHZ_IDLE, nohz_flags(cpu));
+ int first_nohz_idle = 1;
rcu_read_lock();
- for_each_domain(cpu, sd)
+ for_each_domain(cpu, sd) {
+ if (first_nohz_idle) {
+ if (!test_bit(NOHZ_IDLE, &sd->nohz_flags))
+ goto unlock;
+
+ clear_bit(NOHZ_IDLE, &sd->nohz_flags);
+ first_nohz_idle = 0;
+ }
+
atomic_inc(&sd->groups->sgp->nr_busy_cpus);
+ }
+unlock:
rcu_read_unlock();
}
@@ -5409,14 +5416,21 @@ void set_cpu_sd_state_idle(void)
{
struct sched_domain *sd;
int cpu = smp_processor_id();
-
- if (test_bit(NOHZ_IDLE, nohz_flags(cpu)))
- return;
- set_bit(NOHZ_IDLE, nohz_flags(cpu));
+ int first_nohz_idle = 1;
rcu_read_lock();
- for_each_domain(cpu, sd)
+ for_each_domain(cpu, sd) {
+ if (first_nohz_idle) {
+ if (test_bit(NOHZ_IDLE, &sd->nohz_flags))
+ goto unlock;
+
+ set_bit(NOHZ_IDLE, &sd->nohz_flags);
+ first_nohz_idle = 0;
+ }
+
atomic_dec(&sd->groups->sgp->nr_busy_cpus);
+ }
+unlock:
rcu_read_unlock();
}
--
1.7.9.5
From: Pranavkumar Sawargaonkar <pranavkumar(a)linaro.org>
This patch implements early printk support for virtio-mmio console devices without using any hypercalls.
The current virtio early printk code in kernel expects that hypervisor will provide some mechanism generally a hypercall to support early printk. This patch does not break existing hypercall based early print support.
This implementation adds:
1. Early read-write register named early_rw in virtio console's config space.
2. Two host feature flags namely VIRTIO_CONSOLE_F_EARLY_READ and VIRTIO_CONSOLE_F_EARLY_WRITE for telling guest about early-read and early-write capability in console device.
Early write mechanism:
1. When a guest wants to out some character, it has to simply write the character to early_rw register in config space of virtio console device.
Early read mechanism:
1. When a guest wants to in some character, it has to simply read the early_rw register in config space of virtio console device. Lets say we get 32-bit value X.
2. If most significant bit of X is set (i.e. X & 0x80000000 == 0x80000000) then least significant 8 bits of X represents input charaacter else guest need to try again reading early_rw register.
Note: This patch only includes kernel side changes for early printk, the host/hypervisor side emulation of early_rw register is out of scope here.
Signed-off-by: Anup Patel <anup.patel(a)linaro.org>
---
arch/arm64/kernel/early_printk.c | 24 ++++++++++++++++++++++++
include/uapi/linux/virtio_console.h | 4 ++++
2 files changed, 28 insertions(+)
diff --git a/arch/arm64/kernel/early_printk.c b/arch/arm64/kernel/early_printk.c
index ac974f4..a82b5aa 100644
--- a/arch/arm64/kernel/early_printk.c
+++ b/arch/arm64/kernel/early_printk.c
@@ -25,6 +25,9 @@
#include <linux/amba/serial.h>
#include <linux/serial_reg.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_mmio.h>
+#include <linux/virtio_console.h>
static void __iomem *early_base;
static void (*printch)(char ch);
@@ -53,6 +56,26 @@ static void smh_printch(char ch)
}
/*
+ * VIRTIO MMIO based debug console.
+ */
+static void virtio_console_early_printch(char ch)
+{
+ u32 tmp;
+ struct virtio_console_config *p = early_base + VIRTIO_MMIO_CONFIG;
+
+ tmp = readl_relaxed(early_base + VIRTIO_MMIO_DEVICE_ID);
+ if (tmp != VIRTIO_ID_CONSOLE) {
+ return;
+ }
+
+ tmp = readl_relaxed(early_base + VIRTIO_MMIO_HOST_FEATURES);
+ if (!(tmp & (1 << VIRTIO_CONSOLE_F_EARLY_WRITE))) {
+ return;
+ }
+ writeb_relaxed(ch, &p->early_rw);
+}
+
+/*
* 8250/16550 (8-bit aligned registers) single character TX.
*/
static void uart8250_8bit_printch(char ch)
@@ -82,6 +105,7 @@ static const struct earlycon_match earlycon_match[] __initconst = {
{ .name = "smh", .printch = smh_printch, },
{ .name = "uart8250-8bit", .printch = uart8250_8bit_printch, },
{ .name = "uart8250-32bit", .printch = uart8250_32bit_printch, },
+ { .name = "virtio-console", .printch = virtio_console_early_printch, },
{}
};
diff --git a/include/uapi/linux/virtio_console.h b/include/uapi/linux/virtio_console.h
index ee13ab6..1171cb4 100644
--- a/include/uapi/linux/virtio_console.h
+++ b/include/uapi/linux/virtio_console.h
@@ -38,6 +38,8 @@
/* Feature bits */
#define VIRTIO_CONSOLE_F_SIZE 0 /* Does host provide console size? */
#define VIRTIO_CONSOLE_F_MULTIPORT 1 /* Does host provide multiple ports? */
+#define VIRTIO_CONSOLE_F_EARLY_READ 2 /* Does host support early read? */
+#define VIRTIO_CONSOLE_F_EARLY_WRITE 3 /* Does host support early write? */
#define VIRTIO_CONSOLE_BAD_ID (~(u32)0)
@@ -48,6 +50,8 @@ struct virtio_console_config {
__u16 rows;
/* max. number of ports this device can hold */
__u32 max_nr_ports;
+ /* early read/write register */
+ __u32 early_rw;
} __attribute__((packed));
/*
--
1.7.9.5
While migrating to common clock framework (CCF), found that the FIMD clocks
were pulled down by the CCF.
If CCF finds any clock(s) which has NOT been claimed by any of the
drivers, then such clock(s) are PULLed low by CCF.
By calling clk_prepare_enable() for FIMD clocks fixes the issue.
this patch also replaces clk_disable() with clk_disable_unprepare()
during exit.
Signed-off-by: Vikas Sajjan <vikas.sajjan(a)linaro.org>
---
Changes since v2:
- moved clk_prepare_enable() and clk_disable_unprepare() from
fimd_probe() to fimd_clock() as suggested by Inki Dae <inki.dae(a)samsung.com>
Changes since v1:
- added error checking for clk_prepare_enable() and also replaced
clk_disable() with clk_disable_unprepare() during exit.
---
drivers/gpu/drm/exynos/exynos_drm_fimd.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
index 9537761..f2400c8 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
@@ -799,18 +799,18 @@ static int fimd_clock(struct fimd_context *ctx, bool enable)
if (enable) {
int ret;
- ret = clk_enable(ctx->bus_clk);
+ ret = clk_prepare_enable(ctx->bus_clk);
if (ret < 0)
return ret;
- ret = clk_enable(ctx->lcd_clk);
+ ret = clk_prepare_enable(ctx->lcd_clk);
if (ret < 0) {
- clk_disable(ctx->bus_clk);
+ clk_disable_unprepare(ctx->bus_clk);
return ret;
}
} else {
- clk_disable(ctx->lcd_clk);
- clk_disable(ctx->bus_clk);
+ clk_disable_unprepare(ctx->lcd_clk);
+ clk_disable_unprepare(ctx->bus_clk);
}
return 0;
@@ -981,8 +981,8 @@ static int fimd_remove(struct platform_device *pdev)
if (ctx->suspended)
goto out;
- clk_disable(ctx->lcd_clk);
- clk_disable(ctx->bus_clk);
+ clk_disable_unprepare(ctx->lcd_clk);
+ clk_disable_unprepare(ctx->bus_clk);
pm_runtime_set_suspended(dev);
pm_runtime_put_sync(dev);
--
1.7.9.5
On my smp platform which is made of 5 cores in 2 clusters, I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause is:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
More generally, the NOHZ_IDLE flag must be initialized when new sched_domains
are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned.
This condition can be ensured by adding a synchronize_rcu between the
destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE
flag will not be updated with old sched_domain once it has been initialized.
But this solution introduces a additionnal latency in the rebuild sequence
that is called during cpu hotplug.
As suggested by Frederic Weisbecker, another solution is to have the same
rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have introduce
a new sched_domain_rq struct that is the entry point for both sched_domains
and objects that must follow the same lifecycle like NOHZ_IDLE flags. They
will share the same RCU lifecycle and will be always synchronized.
The synchronization is done at the cost of :
- an additional indirection for accessing the first sched_domain level
- an additional indirection and a rcu_dereference before accessing to the
NOHZ_IDLE flag.
Change since v5:
- minor variable and function name change.
- remove a useless null check before kfree
- fix a compilation error when NO_HZ is not set.
Change since v4:
- link both sched_domain and NOHZ_IDLE flag in one RCU object so
their states are always synchronized.
Change since V3;
- NOHZ flag is not cleared if a NULL domain is attached to the CPU
- Remove patch 2/2 which becomes useless with latest modifications
Change since V2:
- change the initialization to idle state instead of busy state so a CPU that
enters idle during the build of the sched_domain will not corrupt the
initialization state
Change since V1:
- remove the patch for SCHED softirq on an idle core use case as it was
a side effect of the other use cases.
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
include/linux/sched.h | 12 ++++++
kernel/sched/core.c | 106 ++++++++++++++++++++++++++++++++++++++++++++-----
kernel/sched/fair.c | 35 +++++++++++-----
kernel/sched/sched.h | 24 +++++++++--
4 files changed, 152 insertions(+), 25 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..61ad5f1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -959,6 +959,18 @@ struct sched_domain {
unsigned long span[0];
};
+/*
+ * Some flags must stay synchronized with fields of sched_group_power and as a
+ * consequence they must follow the same lifecycle for the lockless scheme.
+ * sched_domain_rq encapsulates those flags and sched_domains in one RCU
+ * object.
+ */
+struct sched_domain_rq {
+ struct sched_domain *sd;
+ unsigned long flags;
+ struct rcu_head rcu; /* used during destruction */
+};
+
static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
{
return to_cpumask(sd->span);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 67d0465..d0d3020 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5604,6 +5604,15 @@ static void destroy_sched_domains(struct sched_domain *sd, int cpu)
destroy_sched_domain(sd, cpu);
}
+static void destroy_sched_domain_rq(struct sched_domain_rq *sd_rq, int cpu)
+{
+ if (!sd_rq)
+ return;
+
+ destroy_sched_domains(sd_rq->sd, cpu);
+ kfree_rcu(sd_rq, rcu);
+}
+
/*
* Keep a special pointer to the highest sched_domain that has
* SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this
@@ -5634,10 +5643,23 @@ static void update_top_cache_domain(int cpu)
* hold the hotplug lock.
*/
static void
-cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
+cpu_attach_domain(struct sched_domain_rq *sd_rq, struct root_domain *rd,
+ int cpu)
{
struct rq *rq = cpu_rq(cpu);
- struct sched_domain *tmp;
+ struct sched_domain_rq *old_sd_rq;
+ struct sched_domain *tmp, *sd = NULL;
+
+ /*
+ * If we don't have any sched_domain and associated object, we can
+ * directly jump to the attach sequence otherwise we try to degenerate
+ * the sched_domain
+ */
+ if (!sd_rq)
+ goto attach;
+
+ /* Get a pointer to the 1st sched_domain */
+ sd = sd_rq->sd;
/* Remove the sched domains which do not contribute to scheduling. */
for (tmp = sd; tmp; ) {
@@ -5660,14 +5682,17 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
destroy_sched_domain(tmp, cpu);
if (sd)
sd->child = NULL;
+ /* update sched_domain_rq */
+ sd_rq->sd = sd;
}
+attach:
sched_domain_debug(sd, cpu);
rq_attach_root(rq, rd);
- tmp = rq->sd;
- rcu_assign_pointer(rq->sd, sd);
- destroy_sched_domains(tmp, cpu);
+ old_sd_rq = rq->sd_rq;
+ rcu_assign_pointer(rq->sd_rq, sd_rq);
+ destroy_sched_domain_rq(old_sd_rq, cpu);
update_top_cache_domain(cpu);
}
@@ -5697,12 +5722,14 @@ struct sd_data {
};
struct s_data {
+ struct sched_domain_rq ** __percpu sd_rq;
struct sched_domain ** __percpu sd;
struct root_domain *rd;
};
enum s_alloc {
sa_rootdomain,
+ sa_sd_rq,
sa_sd,
sa_sd_storage,
sa_none,
@@ -5937,7 +5964,7 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
return;
update_group_power(sd, cpu);
- atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
+ atomic_set(&sg->sgp->nr_busy_cpus, 0);
}
int __weak arch_sd_sibling_asym_packing(void)
@@ -6013,6 +6040,8 @@ static void set_domain_attribute(struct sched_domain *sd,
static void __sdt_free(const struct cpumask *cpu_map);
static int __sdt_alloc(const struct cpumask *cpu_map);
+static void __sdrq_free(const struct cpumask *cpu_map, struct s_data *d);
+static int __sdrq_alloc(const struct cpumask *cpu_map, struct s_data *d);
static void __free_domain_allocs(struct s_data *d, enum s_alloc what,
const struct cpumask *cpu_map)
@@ -6021,6 +6050,9 @@ static void __free_domain_allocs(struct s_data *d, enum s_alloc what,
case sa_rootdomain:
if (!atomic_read(&d->rd->refcount))
free_rootdomain(&d->rd->rcu); /* fall through */
+ case sa_sd_rq:
+ __sdrq_free(cpu_map, d); /* fall through */
+ free_percpu(d->sd_rq); /* fall through */
case sa_sd:
free_percpu(d->sd); /* fall through */
case sa_sd_storage:
@@ -6040,9 +6072,14 @@ static enum s_alloc __visit_domain_allocation_hell(struct s_data *d,
d->sd = alloc_percpu(struct sched_domain *);
if (!d->sd)
return sa_sd_storage;
+ d->sd_rq = alloc_percpu(struct sched_domain_rq *);
+ if (!d->sd_rq)
+ return sa_sd;
+ if (__sdrq_alloc(cpu_map, d))
+ return sa_sd_rq;
d->rd = alloc_rootdomain();
if (!d->rd)
- return sa_sd;
+ return sa_sd_rq;
return sa_rootdomain;
}
@@ -6468,6 +6505,47 @@ static void __sdt_free(const struct cpumask *cpu_map)
}
}
+static int __sdrq_alloc(const struct cpumask *cpu_map, struct s_data *d)
+{
+ int j;
+
+ for_each_cpu(j, cpu_map) {
+ struct sched_domain_rq *sd_rq;
+
+ sd_rq = kzalloc_node(sizeof(struct sched_domain_rq),
+ GFP_KERNEL, cpu_to_node(j));
+ if (!sd_rq)
+ return -ENOMEM;
+
+ *per_cpu_ptr(d->sd_rq, j) = sd_rq;
+ }
+
+ return 0;
+}
+
+static void __sdrq_free(const struct cpumask *cpu_map, struct s_data *d)
+{
+ int j;
+
+ for_each_cpu(j, cpu_map)
+ kfree(*per_cpu_ptr(d->sd_rq, j));
+}
+
+static void build_sched_domain_rq(struct s_data *d, int cpu)
+{
+ struct sched_domain_rq *sd_rq;
+ struct sched_domain *sd;
+
+ /* Attach sched_domain to sched_domain_rq */
+ sd = *per_cpu_ptr(d->sd, cpu);
+ sd_rq = *per_cpu_ptr(d->sd_rq, cpu);
+ sd_rq->sd = sd;
+#ifdef NO_HZ
+ /* Init flags */
+ set_bit(NOHZ_IDLE, rq_domain_flags(sd_rq));
+#endif
+}
+
struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
struct s_data *d, const struct cpumask *cpu_map,
struct sched_domain_attr *attr, struct sched_domain *child,
@@ -6497,6 +6575,7 @@ static int build_sched_domains(const struct cpumask *cpu_map,
struct sched_domain_attr *attr)
{
enum s_alloc alloc_state = sa_none;
+ struct sched_domain_rq *sd_rq;
struct sched_domain *sd;
struct s_data d;
int i, ret = -ENOMEM;
@@ -6549,11 +6628,18 @@ static int build_sched_domains(const struct cpumask *cpu_map,
}
}
+ /* Init objects that must follow the sched_domain lifecycle */
+ for_each_cpu(i, cpu_map) {
+ build_sched_domain_rq(&d, i);
+ }
+
/* Attach the domains */
rcu_read_lock();
for_each_cpu(i, cpu_map) {
- sd = *per_cpu_ptr(d.sd, i);
- cpu_attach_domain(sd, d.rd, i);
+ sd_rq = *per_cpu_ptr(d.sd_rq, i);
+ cpu_attach_domain(sd_rq, d.rd, i);
+ /* claim allocation of sched_domain_rq object */
+ *per_cpu_ptr(d.sd_rq, i) = NULL;
}
rcu_read_unlock();
@@ -6984,7 +7070,7 @@ void __init sched_init(void)
rq->last_load_update_tick = jiffies;
#ifdef CONFIG_SMP
- rq->sd = NULL;
+ rq->sd_rq = NULL;
rq->rd = NULL;
rq->cpu_power = SCHED_POWER_SCALE;
rq->post_schedule = 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..2b294f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5392,31 +5392,39 @@ static inline void nohz_balance_exit_idle(int cpu)
static inline void set_cpu_sd_state_busy(void)
{
+ struct sched_domain_rq *sd_rq;
struct sched_domain *sd;
int cpu = smp_processor_id();
- if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
- return;
- clear_bit(NOHZ_IDLE, nohz_flags(cpu));
-
rcu_read_lock();
- for_each_domain(cpu, sd)
+ sd_rq = rcu_dereference_domain_rq(cpu);
+
+ if (!sd_rq || !test_bit(NOHZ_IDLE, rq_domain_flags(sd_rq)))
+ goto unlock;
+ clear_bit(NOHZ_IDLE, rq_domain_flags(sd_rq));
+
+ for_each_domain_from_rq(sd_rq, sd)
atomic_inc(&sd->groups->sgp->nr_busy_cpus);
+unlock:
rcu_read_unlock();
}
void set_cpu_sd_state_idle(void)
{
+ struct sched_domain_rq *sd_rq;
struct sched_domain *sd;
int cpu = smp_processor_id();
- if (test_bit(NOHZ_IDLE, nohz_flags(cpu)))
- return;
- set_bit(NOHZ_IDLE, nohz_flags(cpu));
-
rcu_read_lock();
- for_each_domain(cpu, sd)
+ sd_rq = rcu_dereference_domain_rq(cpu);
+
+ if (!sd_rq || test_bit(NOHZ_IDLE, rq_domain_flags(sd_rq)))
+ goto unlock;
+ set_bit(NOHZ_IDLE, rq_domain_flags(sd_rq));
+
+ for_each_domain_from_rq(sd_rq, sd)
atomic_dec(&sd->groups->sgp->nr_busy_cpus);
+unlock:
rcu_read_unlock();
}
@@ -5673,7 +5681,12 @@ static void run_rebalance_domains(struct softirq_action *h)
static inline int on_null_domain(int cpu)
{
- return !rcu_dereference_sched(cpu_rq(cpu)->sd);
+ struct sched_domain_rq *sd_rq =
+ rcu_dereference_sched(cpu_rq(cpu)->sd_rq);
+ struct sched_domain *sd = NULL;
+ if (sd_rq)
+ sd = sd_rq->sd;
+ return !sd;
}
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cc03cfd..ce27e3b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -417,7 +417,7 @@ struct rq {
#ifdef CONFIG_SMP
struct root_domain *rd;
- struct sched_domain *sd;
+ struct sched_domain_rq *sd_rq;
unsigned long cpu_power;
@@ -505,21 +505,37 @@ DECLARE_PER_CPU(struct rq, runqueues);
#ifdef CONFIG_SMP
-#define rcu_dereference_check_sched_domain(p) \
+#define rcu_dereference_check_sched_domain_rq(p) \
rcu_dereference_check((p), \
lockdep_is_held(&sched_domains_mutex))
+#define rcu_dereference_domain_rq(cpu) \
+ rcu_dereference_check_sched_domain_rq(cpu_rq(cpu)->sd_rq)
+
+#define rcu_dereference_check_sched_domain(cpu) ({ \
+ struct sched_domain_rq *__sd_rq = rcu_dereference_domain_rq(cpu); \
+ struct sched_domain *__sd = NULL; \
+ if (__sd_rq) \
+ __sd = __sd_rq->sd; \
+ __sd; \
+})
+
+#define rq_domain_flags(sd_rq) (&sd_rq->flags)
+
/*
- * The domain tree (rq->sd) is protected by RCU's quiescent state transition.
+ * The domain tree (rq->sd_rq) is protected by RCU's quiescent state transition.
* See detach_destroy_domains: synchronize_sched for details.
*
* The domain tree of any CPU may only be accessed from within
* preempt-disabled sections.
*/
#define for_each_domain(cpu, __sd) \
- for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \
+ for (__sd = rcu_dereference_check_sched_domain(cpu); \
__sd; __sd = __sd->parent)
+#define for_each_domain_from_rq(sd_rq, __sd) \
+ for (__sd = sd_rq->sd; __sd; __sd = __sd->parent)
+
#define for_each_lower_domain(sd) for (; sd; sd = sd->child)
/**
--
1.7.9.5
On 10 April 2013 11:44, Sedat Dilek <sedat.dilek(a)gmail.com> wrote:
> I found this "[RFC PATCH] kbuild: Build linux-tools package with 'make
> deb-pkg'" from February 2012.
> Can't say what happened to it...
Sedat,
Sorry for being late. I am down with Fever and throat infection since few days.
Still struggling with it..
There are few things i tried. Firstly the tag: next-20130326 is bad as there are
some bad commits in cpufreq core in it.
I then tried latest linux-next/master on my Thinkpad (model name : Intel(R)
Core(TM) i7-2640M CPU @ 2.80GHz) and couldn't boot it up. My ubuntu
just hanged.
Then i tried Rafael's linux-next branch
079576f Merge branch 'pm-cpufreq-next' into linux-next
And couldn't find any issues with it. I am easily able to remove/add cpus at
runtime..
Can you give this branch a try?
--
viresh
Hi,
This patch is to add config fragments used to enable most of the
features used by big LITTLE IKS.
Signed-off-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
CC: Viresh Kumar <viresh.kumar(a)linaro.org>
CC: Andrey Konovalov <andrey.konovalov(a)linaro.org>
commit b547e2d829d13bb391b062dfd9837bdd17a8450c
Author: Naresh Kamboju <naresh.kamboju(a)linaro.org>
AuthorDate: Mon Apr 22 12:57:05 2013 +0530
Commit: Naresh Kamboju <naresh.kamboju(a)linaro.org>
CommitDate: Mon Apr 22 12:57:05 2013 +0530
configs: Add config fragments for big LITTLE IKS
This patch adds config fragments used to enable most of the features used by
big LITTLE IKS.
Signed-off-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
diff --git a/linaro/configs/big-LITTLE-IKS.conf
b/linaro/configs/big-LITTLE-IKS.conf
new file mode 100644
index 0000000..b067fde
--- /dev/null
+++ b/linaro/configs/big-LITTLE-IKS.conf
@@ -0,0 +1,5 @@
+CONFIG_BIG_LITTLE=y
+CONFIG_BL_SWITCHER=y
+CONFIG_ARM_DT_BL_CPUFREQ=y
+CONFIG_ARM_VEXPRESS_BL_CPUFREQ=y
+CONFIG_CPU_FREQ_GOV_USERSPACE=y
This patchset series provide some code consolidation across the different
cpuidle drivers. It contains two parts, the first one is the removal of
the time keeping flag and the second one, is a common initialization routine.
All the drivers use the en_core_tk_irqen flag, which means it is not necessary
to make the time computation optional. We can remove this flag and assume the
cpuidle framework always manage this operation.
The cpuidle code initialization is duplicated across the different drivers in
the same manner.
The repeating pattern is:
SMP:
cpuidle_register_driver(drv);
for_each_possible_cpu(cpu) {
dev = per_cpu(cpuidle_device, cpu);
cpuidle_register_device(dev);
}
UP:
cpuidle_register_driver(drv);
cpuidle_register_device(dev);
As on a UP machine the macro 'for_each_cpu' is a one iteration loop, using the
initialization loop from SMP to UP works.
The patchset does some cleanup for different drivers in order to make the init
code the same. Then it introduces a generic function:
cpuidle_register(struct cpuidle_driver *drv, struct cpumask *cpumask)
The cpumask is for the coupled idle states.
The drivers are then modified to take into account this new function and
to remove the duplicated code.
The benefit is observable in the diffstat: 332 lines of code removed.
Tested-on: u8500
Tested-on: at91
Tested-on: intel i5
Tested-on: OMAP4
Compiled with and without CPU_IDLE for:
u8500, at91, davinci, exynos, imx5, imx6, kirkwood, multi_v7 (for calxeda),
omap2plus, s3c64, tegra1, tegra2, tegra3
Daniel Lezcano (18):
ARM: OMAP3: remove cpuidle_wrap_enter
cpuidle: remove en_core_tk_irqen flag
ARM: ux500: cpuidle: replace for_each_online_cpu by
for_each_possible_cpu
ARM: imx: cpuidle: create separate drivers for imx5/imx6
cpuidle: make a single register function for all
ARM: ux500: cpuidle: use init/exit common routine
ARM: at91: cpuidle: use init/exit common routine
ARM: OMAP3: cpuidle: use init/exit common routine
ARM: s3c64xx: cpuidle: use init/exit common routine
ARM: tegra1: cpuidle: use init/exit common routine
ARM: shmobile: cpuidle: use init/exit common routine
ARM: OMAP4: cpuidle: use init/exit common routine
ARM: tegra2: cpuidle: use init/exit common routine
ARM: tegra3: cpuidle: use init/exit common routine
ARM: calxeda: cpuidle: use init/exit common routine
ARM: kirkwood: cpuidle: use init/exit common routine
ARM: davinci: cpuidle: use init/exit common routine
ARM: imx: cpuidle: use init/exit common routine
arch/arm/mach-at91/cpuidle.c | 18 +--
arch/arm/mach-davinci/cpuidle.c | 21 +---
arch/arm/mach-exynos/cpuidle.c | 1 -
arch/arm/mach-imx/Makefile | 1 +
arch/arm/mach-imx/cpuidle-imx5.c | 40 +++++++
arch/arm/mach-imx/cpuidle-imx6q.c | 3 +-
arch/arm/mach-imx/cpuidle.c | 80 -------------
arch/arm/mach-imx/cpuidle.h | 10 +-
arch/arm/mach-imx/pm-imx5.c | 30 +----
arch/arm/mach-omap2/cpuidle34xx.c | 49 ++------
arch/arm/mach-omap2/cpuidle44xx.c | 23 +---
arch/arm/mach-s3c64xx/cpuidle.c | 15 +--
arch/arm/mach-shmobile/cpuidle.c | 11 +-
arch/arm/mach-shmobile/pm-sh7372.c | 1 -
arch/arm/mach-tegra/cpuidle-tegra114.c | 27 +----
arch/arm/mach-tegra/cpuidle-tegra20.c | 34 +-----
arch/arm/mach-tegra/cpuidle-tegra30.c | 28 +----
arch/arm/mach-ux500/cpuidle.c | 33 +-----
arch/powerpc/platforms/pseries/processor_idle.c | 1 -
arch/sh/kernel/cpu/shmobile/cpuidle.c | 1 -
arch/x86/kernel/apm_32.c | 1 -
drivers/acpi/processor_idle.c | 1 -
drivers/cpuidle/cpuidle-calxeda.c | 53 +--------
drivers/cpuidle/cpuidle-kirkwood.c | 18 +--
drivers/cpuidle/cpuidle.c | 137 ++++++++++++++---------
drivers/idle/intel_idle.c | 1 -
include/linux/cpuidle.h | 20 ++--
27 files changed, 162 insertions(+), 496 deletions(-)
create mode 100644 arch/arm/mach-imx/cpuidle-imx5.c
delete mode 100644 arch/arm/mach-imx/cpuidle.c
--
1.7.9.5
Parent node must be put after using it to balance its usage count. This was
missing in cpufreq-cpu0 driver. Fix it.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/cpufreq/cpufreq-cpu0.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
index 31282fc..3ab8294 100644
--- a/drivers/cpufreq/cpufreq-cpu0.c
+++ b/drivers/cpufreq/cpufreq-cpu0.c
@@ -257,6 +257,7 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev)
}
of_node_put(np);
+ of_node_put(parent);
return 0;
out_free_table:
--
1.7.12.rc2.18.g61b472e
== Linus Walleij linusw ==
=== Highlights ===
* Fixed the problem caused by simultaneous upstreaming of
ab8500 debug code and multiplatform. Discussed this a bit,
the implicit kernel "optimistic change" policy that expect most
changes to not collide, expect maintainers to be very
responsive at all times.
* Readying the pinctrl tree for the merge window.
* Reviewed and merged a few of Fabios backports to the
internal ST-Ericsson tree.
* Sent a final Integrator/AP PCI DT series. This is hanging
waiting for the infrastructure from Andrew Murray to be
merged first.
* sent a set of patches probing the Nomadik MTU and
all Nomadik clocks from the device tree.
* I also have a pretty big device tree patch bundle for the U300
building up, but want to have it in a more complete state
before I post. The plan for U300 is: enable all for device tree,
delete board files, multiplatform in that order.
=== Plans ===
* A short paternity leave 6/5->9/5 in may.
* Find all regressions for ux500 lurking in the linux-next tree.
* Convert Nomadik pinctrl driver to register GPIO ranges
from the gpiochip side.
* Test the PL08x patches on the Ericsson Research
PB11MPCore and submit platform data for using
pl08x DMA on that platform.
* Get hands dirty with regmap.
=== Issues ===
* Things have been hectic internally at ST-Ericsson diverting me
from Linaro work.
* I am spending roughly 30-60 mins every day on internal review
work on internal baseline and mainline patches-to-be.
Thanks,
Linus Walleij
The noop functions code is not necessary because the header file is
included in files which are compiled when CONFIG_CPU_IDLE is on.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
---
arch/arm/include/asm/cpuidle.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
index 2fca60a..7367787 100644
--- a/arch/arm/include/asm/cpuidle.h
+++ b/arch/arm/include/asm/cpuidle.h
@@ -1,13 +1,8 @@
#ifndef __ASM_ARM_CPUIDLE_H
#define __ASM_ARM_CPUIDLE_H
-#ifdef CONFIG_CPU_IDLE
extern int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index);
-#else
-static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index) { return -ENODEV; }
-#endif
+ struct cpuidle_driver *drv, int index);
/* Common ARM WFI state */
#define ARM_CPUIDLE_WFI_STATE_PWR(p) {\
--
1.7.9.5
The current update of the rq's load can be erroneous when RT tasks are
involved
The update of the load of a rq that becomes idle, is done only if the avg_idle
is less than sysctl_sched_migration_cost. If RT tasks and short idle duration
alternate, the runnable_avg will not be updated correctly and the time will be
accounted as idle time when a CFS task wakes up.
A new idle_enter function is called when the next task is the idle function
so the elapsed time will be accounted as run time in the load of the rq,
whatever the average idle time is. The function update_rq_runnable_avg is
removed from idle_balance.
When a RT task is scheduled on an idle CPU, the update of the rq's load is
not done when the rq exit idle state because CFS's functions are not
called. Then, the idle_balance, which is called just before entering the
idle function, updates the rq's load and makes the assumption that the
elapsed time since the last update, was only running time.
As a consequence, the rq's load of a CPU that only runs a periodic RT task,
is close to LOAD_AVG_MAX whatever the running duration of the RT task is.
A new idle_exit function is called when the prev task is the idle function
so the elapsed time will be accounted as idle time in the rq's load.
Changes since V5:
- Rename idle_enter/exit function to idle_enter/exit_fair
Changes since V4:
- Rebase on v3.9-rc6 instead of Steven Rostedt's patches
- Create the post_schedule_idle function that was previously created by Steven's patches
Changes since V3:
- Remove dependancy with CONFIG_FAIR_GROUP_SCHED
- Add a new idle_enter function and create a post_schedule callback for
idle class
- Remove the update_runnable_avg from idle_balance
Changes since V2:
- remove useless definition for UP platform
- rebased on top of Steven Rostedt's patches :
https://lkml.org/lkml/2013/2/12/558
Changes since V1:
- move code out of schedule function and create a pre_schedule callback for
idle class instead.
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
kernel/sched/fair.c | 23 +++++++++++++++++++++--
kernel/sched/idle_task.c | 16 ++++++++++++++++
kernel/sched/sched.h | 12 ++++++++++++
3 files changed, 49 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..1de3df0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq,
se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter);
} /* migrations, e.g. sleep=0 leave decay_count == 0 */
}
+
+/*
+ * Update the rq's load with the elapsed running time before entering
+ * idle. if the last scheduled task is not a CFS task, idle_enter will
+ * be the only way to update the runnable statistic.
+ */
+void idle_enter_fair(struct rq *this_rq)
+{
+ update_rq_runnable_avg(this_rq, 1);
+}
+
+/*
+ * Update the rq's load with the elapsed idle time before a task is
+ * scheduled. if the newly scheduled task is not a CFS task, idle_exit will
+ * be the only way to update the runnable statistic.
+ */
+void idle_exit_fair(struct rq *this_rq)
+{
+ update_rq_runnable_avg(this_rq, 0);
+}
+
#else
static inline void update_entity_load_avg(struct sched_entity *se,
int update_cfs_rq) {}
@@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq)
if (this_rq->avg_idle < sysctl_sched_migration_cost)
return;
- update_rq_runnable_avg(this_rq, 1);
-
/*
* Drop the rq->lock, but keep IRQ/preempt disabled.
*/
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index b6baf37..b8ce773 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -13,6 +13,16 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
{
return task_cpu(p); /* IDLE tasks as never migrated */
}
+
+static void pre_schedule_idle(struct rq *rq, struct task_struct *prev)
+{
+ idle_exit_fair(rq);
+}
+
+static void post_schedule_idle(struct rq *rq)
+{
+ idle_enter_fair(rq);
+}
#endif /* CONFIG_SMP */
/*
* Idle tasks are unconditionally rescheduled:
@@ -25,6 +35,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
static struct task_struct *pick_next_task_idle(struct rq *rq)
{
schedstat_inc(rq, sched_goidle);
+#ifdef CONFIG_SMP
+ /* Trigger the post schedule to do an idle_enter for CFS */
+ rq->post_schedule = 1;
+#endif
return rq->idle;
}
@@ -86,6 +100,8 @@ const struct sched_class idle_sched_class = {
#ifdef CONFIG_SMP
.select_task_rq = select_task_rq_idle,
+ .pre_schedule = pre_schedule_idle,
+ .post_schedule = post_schedule_idle,
#endif
.set_curr_task = set_curr_task_idle,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cc03cfd..8f1d80e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -880,6 +880,18 @@ extern const struct sched_class idle_sched_class;
extern void trigger_load_balance(struct rq *rq, int cpu);
extern void idle_balance(int this_cpu, struct rq *this_rq);
+/*
+ * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg
+ * becomes useful in lb
+ */
+#if defined(CONFIG_FAIR_GROUP_SCHED)
+extern void idle_enter_fair(struct rq *this_rq);
+extern void idle_exit_fair(struct rq *this_rq);
+#else
+static inline void idle_enter_fair(struct rq *this_rq) {}
+static inline void idle_exit_fair(struct rq *this_rq) {}
+#endif
+
#else /* CONFIG_SMP */
static inline void idle_balance(int cpu, struct rq *rq)
--
1.7.9.5
Guenter and Anton,
As suggested by Anton, I rebased these patches to his latest battery-2.6 tree.
Thanks.
v7 -> v8 changes:
- rebase these patches to Anton's latest battery-2.6 tree.
v6 -> v7 changes:
- move exporting symbols from [5/5] to [4/5], which was a mistake.
v5 -> v6 changes:
- add depend on AB8500_BM in Kconfig
- fix wrong usage of clamp_val()
- export symbols for module compiling
v4 -> v5 changes:
- split the old [2/3]-ab8500-re-arrange-ab8500-power-and-temperature-data into
new three [2/5], [3/5] and [4/5] patches.
- hwmon driver minor coding style clean ups:
- {} usage in if-else statement in ab8500_read_sensor function
- index error fix in gpadc_monitor function
- fix issue of clamp_val() usage
- remove unnecessary else in function abx500_attrs_visible
- remove redundant print message about irq set up
- return the calling function return value directly in probe function
v3 -> v4 changes:
for patch [3/3]
- define delays in HZ
- update ab8500_read_sensor function, returning temp by parameter
- remove ab8500_is_visible function
- use clamp_val in set_min and set_max callback
- remove unnecessary locks in remove and suspend functions
- let abx500 and ab8500 use its own data structure
for patch [2/3]
- move the data tables from driver/power/ab8500_bmdata.c to
include/linux/power/ab8500.h
- rename driver/power/ab8500_bmdata.c to driver/power/ab8500_bm.c
- rename these variable names to eliminate CamelCase warnings
- add const attribute to these data
v2 -> v3 changes:
- Add interface for converting voltage to temperature
- Remove temp5 sensor since we cannot offer temperature read interface of it
- Update hyst to use absolute temperature instead of a difference
- Add the 3/3 patch
v1 -> v2 changes:
- Add Documentation/hwmon/abx500 and Documentation/hwmon/abx500
- Make devices which cannot report milli-Celsius invisible
- Add temp5_crit interface
- Re-work the old find_active_thresholds() to threshold_updated()
- Reset updated_min_alarm and updated_max_alarm at the end of each loop
- Update the hyst mechamisn to make it works as real hyst
- Remove non-stand attributes
- Re-order the operations sequence inside probe and remove functions
- Update all the lock usages to eliminate race conditions
- Make attibutes index starts from 0
also changes:
- Since the old [1/2] "ARM: ux500: rename ab8500 to abx500 for hwmon driver"
has been merged by Samuel, so won't send it again.
- Add another new patch "ab8500_btemp: export two symblols" as [2/2] of this
patch set.
Hongbo Zhang (5):
ab8500_btemp: make ab8500_btemp_get* interfaces public
ab8500: power: eliminate CamelCase warning of some variables
ab8500: power: add const attributes to some data arrays
ab8500: power: export abx500_res_to_temp tables for hwmon
hwmon: add ST-Ericsson ABX500 hwmon driver
Documentation/hwmon/ab8500 | 22 ++
Documentation/hwmon/abx500 | 28 ++
drivers/hwmon/Kconfig | 13 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/ab8500.c | 206 +++++++++++++++
drivers/hwmon/abx500.c | 491 +++++++++++++++++++++++++++++++++++
drivers/hwmon/abx500.h | 69 +++++
drivers/power/ab8500_bmdata.c | 44 ++--
drivers/power/ab8500_btemp.c | 5 +-
drivers/power/ab8500_fg.c | 4 +-
include/linux/mfd/abx500.h | 6 +-
include/linux/mfd/abx500/ab8500-bm.h | 1 +
include/linux/power/ab8500.h | 16 ++
13 files changed, 882 insertions(+), 24 deletions(-)
create mode 100644 Documentation/hwmon/ab8500
create mode 100644 Documentation/hwmon/abx500
create mode 100644 drivers/hwmon/ab8500.c
create mode 100644 drivers/hwmon/abx500.c
create mode 100644 drivers/hwmon/abx500.h
create mode 100644 include/linux/power/ab8500.h
--
1.8.0
This series is a set of prerequistes for getting the new context
tracking subsystem, and adaptive tickless support working on ARM.
Kevin Hilman (3):
cputime_nsecs: use math64.h for nsec resolution conversion helpers
init/Kconfig: virt CPU accounting: drop 64-bit requirment
ARM: Kconfig: allow virt CPU accounting
arch/arm/Kconfig | 1 +
include/asm-generic/cputime_nsecs.h | 28 +++++++++++++++++++---------
init/Kconfig | 2 +-
3 files changed, 21 insertions(+), 10 deletions(-)
--
1.8.2
=== David Long ===
=== Highlights ===
* Provided information on native kernel build lava stress test for new
landing team
* Spearated uprobe tables and parsing code from kprobe sources. Testing
to see if this both works and makes sense.
=== Plans ===
* Continue with uprobe/krpobe
* Start building systemtap
* Need input on when to make flight arrangements for Dublin
=== Issues ===
* None
-dl
=== Highlights ===
* Worked out a few linaro.android merge issues w/ -rc6 found by Tixy
* Queued some community time patches
* Sent tglx pull request for lock-hold time reduction patchset, so my
current 3.10 queue is merged in -tip
* Did another review cycle with Serban's binder patches
* Talked with GregKH and Erik on best practices with driver
infrastructure in staging.
* Further discussions with Minchan on file backed vranges.
* Generated some minor fixups to issues noticed by Tixy to the Android
branch and pushed them to AOSP for review. So far 1/4 merged.
* Reviewed blueprints and held bi-weekly Android upstreaming hangout.
* Continued working on improving vrange patches to work with mmapped files.
=== Plans ===
* Finish prep for lsf-mm
* Attend and present at lsf-mm
* Still need to work on earlysuspend blog post
* Likely more discussion on perf/sched_clock() interfaces
=== Issues ===
* NA