This is the next planned step of "add ->set_dev_mode" patchset..
Its not being sent out (before earlier patchset is accepted by all) to receive
*more* criticism (I already got enough :)), but to give an overall view of where
we are heading.
You can choose to skip reviewing this and concentrate on the first patchset
instead unless that is upstreamed :)
Oh man, I am too scared now :)
Okay, here we go:
A clockevent device is used to service timers/hrtimers requests and the next
event (when it should fire) is decided by the timer/hrtimer expiring next. When
no timers/hrtimers are pending to be serviced, the expiry time is set to a
special value: KTIME_MAX. This means that no events are required for indefinite
amount of time.
This would normally happen with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
When expiry == KTIME_MAX, either we cancel the tick-sched hrtimer
(NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES).
But, the clockevent device is already reprogrammed from tick-handler for next
tick.
So, the clockevent device will fire one more time. In NOHZ_MODE_HIGHRES, we will
consider it as a spurious interrupt and just return from hrtimer_interrupt(). In
NOHZ_MODE_LOWRES, we schedule the next tick again from tick_nohz_handler()?
(This is what I could read from the code, not very sure though. Otherwise, it
means that in NOHZ_MODE_LOWRES we are never tickless).
Ideally, as the clock event device is programmed in ONESHOT mode it should just
fire one more time and that's it. But many implementations (like arm_arch_timer,
etc) only have PERIODIC mode available and their drivers emulate ONESHOT over
that. Which means that on these platforms we will get spurious interrupts at
tick rate and that will hurt our tickless-ness badly.
At this time the clockevent device should be stopped, or its interrupts may be
masked in order to get these issues fixed.
A simple (yet hacky) solution to get this fixed could be: update
hrtimer_force_reprogram() to always reprogram clockevent device and update
clockevent drivers to STOP generating events (or delay it to max time) when
'expires' is set to KTIME_MAX. But the drawback here is that every clockevent
driver has to be hacked for this particular case and its very easy for new ones
to miss this. Also, NOHZ_MODE_LOWRES problem mentioned above wouldn't be fixed
by this.
However, Thomas suggested to add an optional mode: ONESHOT_STOPPED
(lkml.org/lkml/2014/5/9/508) to solve this problem.
First patch implements the required infrastructure to start/stop clockevent
device. Third patch stops clockevent devices when no longer required and Second
patch starts them again once required.
The review order can be 1,3,2 for better understanding. Patch 2 was required
before 3 to keep 'git bisect' happy :)
Fourth patch is there to catch corner cases where we try to set next event while
being in ONESHOT_STOPPED mode. We will do a WARN_ON_ONCE() then. The last patch
modifies a sample driver (arm_arch_timer) to demonstrate/test this patchset.
Other drivers would be updated later.
Viresh Kumar (5):
clockevents: Introduce CLOCK_EVT_MODE_ONESHOT_STOPPED mode
tick-sched: switchback to ONESHOT mode if clockevent device is stopped
tick-sched: stop clockevent device when no longer required
clockevents: Catch event programming in ONESHOT_STOPPED mode
clocksource: arm_arch_timer: Add support for
CLOCK_EVT_MODE_ONESHOT_STOPPED
drivers/clocksource/arm_arch_timer.c | 1 +
include/linux/clockchips.h | 1 +
include/linux/tick.h | 2 ++
kernel/hrtimer.c | 53 +++++++++++++++++++++++++++++++++---
kernel/time/clockevents.c | 17 ++++++++++--
kernel/time/tick-oneshot.c | 20 ++++++++++++++
kernel/time/tick-sched.c | 4 +++
7 files changed, 92 insertions(+), 6 deletions(-)
--
2.0.0.rc2
Tegra's driver got updated a bit (00917dd cpufreq: Tegra: implement intermediate
frequency callbacks) and implements new 'intermediate freq' infrastructure of
core. Above commit updated comments about when to call
clk_prepare_enable(pll_x_clk) and Doug wasn't satisfied with those comments and
said this:
> The "Though when target-freq is intermediate freq, we don't need to
> take this reference." makes me think that this function is actually
> called when target-freq is intermediate freq. I don't think it is,
> right?
For better clarity just make that comment more explicit about when we call
tegra_target_intermediate(). Wasn't sure if we actually need a commit for this,
but anyway lets other decide if its worth enough :)
Reported-by: Doug Anderson <dianders(a)chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/cpufreq/tegra-cpufreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/tegra-cpufreq.c b/drivers/cpufreq/tegra-cpufreq.c
index a5fbc0a..48bc89b 100644
--- a/drivers/cpufreq/tegra-cpufreq.c
+++ b/drivers/cpufreq/tegra-cpufreq.c
@@ -73,7 +73,7 @@ static int tegra_target_intermediate(struct cpufreq_policy *policy,
* off when we move the cpu off of it as enabling it again while we
* switch to it from tegra_target() would take additional time. Though
* when target-freq is intermediate freq, we don't need to take this
- * reference.
+ * reference and so this routine isn't called at all.
*/
clk_prepare_enable(pll_x_clk);
--
2.0.0.rc2
3.13.11.3 -stable review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar <viresh.kumar(a)linaro.org>
commit 27630532ef5ead28b98cfe28d8f95222ef91c2b7 upstream.
Since commit d689fe222 (NOHZ: Check for nohz active instead of nohz
enabled) the tick_nohz_switch_to_nohz() function returns because it
checks for the tick_nohz_active flag. This can't be set, because the
function itself sets it.
Undo the change in tick_nohz_switch_to_nohz().
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: fweisbec(a)gmail.com
Cc: Arvind.Chauhan(a)arm.com
Cc: linaro-networking(a)linaro.org
Link: http://lkml.kernel.org/r/40939c05f2d65d781b92b20302b02243d0654224.139753798…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Kamal Mostafa <kamal(a)canonical.com>
---
kernel/time/tick-sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ea20f7d..29b063b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -970,7 +970,7 @@ static void tick_nohz_switch_to_nohz(void)
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
ktime_t next;
- if (!tick_nohz_active)
+ if (!tick_nohz_enabled)
return;
local_irq_disable();
--
1.9.1
This is a note to let you know that I have just added a patch titled
tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz()
to the linux-3.13.y-queue branch of the 3.13.y.z extended stable tree
which can be found at:
http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/lin…
This patch is scheduled to be released in version 3.13.11.3.
If you, or anyone else, feels it should not be added to this tree, please
reply to this email.
For more information about the 3.13.y.z tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable
Thanks.
-Kamal
------
>From 05760233a5718be8d39485f78d44e50d6a721290 Mon Sep 17 00:00:00 2001
From: Viresh Kumar <viresh.kumar(a)linaro.org>
Date: Tue, 15 Apr 2014 10:54:41 +0530
Subject: tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz()
commit 27630532ef5ead28b98cfe28d8f95222ef91c2b7 upstream.
Since commit d689fe222 (NOHZ: Check for nohz active instead of nohz
enabled) the tick_nohz_switch_to_nohz() function returns because it
checks for the tick_nohz_active flag. This can't be set, because the
function itself sets it.
Undo the change in tick_nohz_switch_to_nohz().
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: fweisbec(a)gmail.com
Cc: Arvind.Chauhan(a)arm.com
Cc: linaro-networking(a)linaro.org
Link: http://lkml.kernel.org/r/40939c05f2d65d781b92b20302b02243d0654224.139753798…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Kamal Mostafa <kamal(a)canonical.com>
---
kernel/time/tick-sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ea20f7d..29b063b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -970,7 +970,7 @@ static void tick_nohz_switch_to_nohz(void)
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
ktime_t next;
- if (!tick_nohz_active)
+ if (!tick_nohz_enabled)
return;
local_irq_disable();
--
1.9.1
This is a note to let you know that I have just added a patch titled
tick-common: Fix wrong check in tick_check_replacement()
to the linux-3.13.y-queue branch of the 3.13.y.z extended stable tree
which can be found at:
http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/lin…
This patch is scheduled to be released in version 3.13.11.3.
If you, or anyone else, feels it should not be added to this tree, please
reply to this email.
For more information about the 3.13.y.z tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable
Thanks.
-Kamal
------
>From a7a150dca33ba90ecf23eb27b055c1a3267cb263 Mon Sep 17 00:00:00 2001
From: Viresh Kumar <viresh.kumar(a)linaro.org>
Date: Tue, 15 Apr 2014 10:54:37 +0530
Subject: tick-common: Fix wrong check in tick_check_replacement()
commit 521c42990e9d561ed5ed9f501f07639d0512b3c9 upstream.
tick_check_replacement() returns if a replacement of clock_event_device is
possible or not. It does this as the first check:
if (tick_check_percpu(curdev, newdev, smp_processor_id()))
return false;
Thats wrong. tick_check_percpu() returns true when the device is
useable. Check for false instead.
[ tglx: Massaged changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: fweisbec(a)gmail.com
Cc: Arvind.Chauhan(a)arm.com
Cc: linaro-networking(a)linaro.org
Link: http://lkml.kernel.org/r/486a02efe0246635aaba786e24b42d316438bf3b.139753798…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Kamal Mostafa <kamal(a)canonical.com>
---
kernel/time/tick-common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 162b03a..425bfae 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -275,7 +275,7 @@ static bool tick_check_preferred(struct clock_event_device *curdev,
bool tick_check_replacement(struct clock_event_device *curdev,
struct clock_event_device *newdev)
{
- if (tick_check_percpu(curdev, newdev, smp_processor_id()))
+ if (!tick_check_percpu(curdev, newdev, smp_processor_id()))
return false;
return tick_check_preferred(curdev, newdev);
--
1.9.1
Dear all,
There's a question in the arch/arm64/kernel/entry.S as following,
/*
* EL1 mode handlers.
*/
el1_sync:
kernel_entry 1
mrs x1, esr_el1 // read the syndrome register
lsr x24, x1, #ESR_EL1_EC_SHIFT // exception class
cmp x24, #ESR_EL1_EC_DABT_EL1 // data abort in EL1
b.eq el1_da
cmp x24, #ESR_EL1_EC_SYS64 // configurable trap
b.eq el1_undef
cmp x24, #ESR_EL1_EC_SP_ALIGN // stack alignment exception
b.eq el1_sp_pc
el1_sp_pc:
/*
* Stack or PC alignment exception handling
*/
mrs x0, far_el1
- mov x1, x25 ==> this is an extra operation
mov x2, sp
b do_sp_pc_abort //Jump to C Exception handler
/**The C Exception Handler/
asmlinkage void __exception do_sp_pc_abort(unsigned long addr,
unsigned int esr,
struct pt_regs *regs)
{
...
}
We use x1 register to store the value of ESR, and check the value to identify which exception handler to jump,
And there's a weird part In stack alignment exception handler(el1_sp_pc),
Why do we need to move x25 to x1?
The ESR has been stored into x1, and should be directly pass to do_sp_pc_abort function
"MOV x1, x25" is an extra operation and do_sp_pc_abort would get the wrong value of esr...
I'm not sure whether I'm right or not, hope someone can take a look at it, thx
BRs
andy
cpufreq-cpu0 uses thermal framework to register a cooling device, but doesn't
depend on it as there are dummy calls provided by thermal layer when
CONFIG_THERMAL=n. And when these calls fail, the driver is still usable.
Similar explanation is valid for regulators as well. We do have dummy calls
available for regulator APIs and the driver can work even when those calls
fail.
So, we don't really need to mention thermal and regulators as a dependency for
cpufreq-cpu0 in Kconfig as platforms without support for thermal/regulator can
also use this driver. Remove this dependency.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
Rafael,
The dependency patches from regulators core are already pushed to Linus's tree
and so this patch can go in now.
drivers/cpufreq/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index 1fbe11f..e473d65 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -185,7 +185,7 @@ config CPU_FREQ_GOV_CONSERVATIVE
config GENERIC_CPUFREQ_CPU0
tristate "Generic CPU0 cpufreq driver"
- depends on HAVE_CLK && REGULATOR && OF && THERMAL && CPU_THERMAL
+ depends on HAVE_CLK && OF
select PM_OPP
help
This adds a generic cpufreq driver for CPU0 frequency management.
--
2.0.0.rc2
Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in frequency table.
Sachin recently found this problem with cpufreq-cpu0 driver when he was testing
it for Exynos.
Set this flag for cpufreq-cpu0 driver.
Reported-and-tested-by: Sachin Kamat <sachin.kamat(a)linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/cpufreq/cpufreq-cpu0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
index 09b9129..ee1ae30 100644
--- a/drivers/cpufreq/cpufreq-cpu0.c
+++ b/drivers/cpufreq/cpufreq-cpu0.c
@@ -104,7 +104,7 @@ static int cpu0_cpufreq_init(struct cpufreq_policy *policy)
}
static struct cpufreq_driver cpu0_cpufreq_driver = {
- .flags = CPUFREQ_STICKY,
+ .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = cpu0_set_target,
.get = cpufreq_generic_get,
--
2.0.0.rc2
'copy_prev_load' was recently added by commit: 18b46ab (cpufreq: governor: Be
friendly towards latency-sensitive bursty workloads).
It actually is a bit redundant as we also have 'prev_load' which can store any
integer value and can be used instead of 'copy_prev_load' by setting it zero.
True load can also turn out to be zero during long idle intervals (and hence the
actual value of 'prev_load' and the overloaded value can clash). However this is
not a problem because, if the true load was really zero in the previous
interval, it makes sense to evaluate the load afresh for the current interval
rather than copying the previous load.
So, drop 'copy_prev_load' and use 'prev_load' instead.
Update comments as well to make it more clear.
There is another change here which was probably missed by Srivatsa during the
last version of updates he made. The unlikely in the 'if' statement was covering
only half of the condition and the whole line should actually come under it.
Also checkpatch is made more silent as it was reporting this (--strict option):
CHECK: Alignment should match open parenthesis
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
Resend: Updated comments/logs as suggested by Srivatsa.
drivers/cpufreq/cpufreq_governor.c | 19 ++++++++++++++-----
drivers/cpufreq/cpufreq_governor.h | 9 +++++----
2 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 9004450..1b44496 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -131,15 +131,25 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
* timer would not have fired during CPU-idle periods. Hence
* an unusually large 'wall_time' (as compared to the sampling
* rate) indicates this scenario.
+ *
+ * prev_load can be zero in two cases and we must recalculate it
+ * for both cases:
+ * - during long idle intervals
+ * - explicitly set to zero
*/
- if (unlikely(wall_time > (2 * sampling_rate)) &&
- j_cdbs->copy_prev_load) {
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
load = j_cdbs->prev_load;
- j_cdbs->copy_prev_load = false;
+
+ /*
+ * Perform a destructive copy, to ensure that we copy
+ * the previous load only once, upon the first wake-up
+ * from idle.
+ */
+ j_cdbs->prev_load = 0;
} else {
load = 100 * (wall_time - idle_time) / wall_time;
j_cdbs->prev_load = load;
- j_cdbs->copy_prev_load = true;
}
if (load > max_load)
@@ -373,7 +383,6 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
(j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle);
j_cdbs->prev_load = 100 * prev_load /
(unsigned int) j_cdbs->prev_cpu_wall;
- j_cdbs->copy_prev_load = true;
if (ignore_nice)
j_cdbs->prev_cpu_nice =
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index c2a5b7e..cc401d1 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -134,12 +134,13 @@ struct cpu_dbs_common_info {
u64 prev_cpu_idle;
u64 prev_cpu_wall;
u64 prev_cpu_nice;
- unsigned int prev_load;
/*
- * Flag to ensure that we copy the previous load only once, upon the
- * first wake-up from idle.
+ * Used to keep track of load in the previous interval. However, when
+ * explicitly set to zero, it is used as a flag to ensure that we copy
+ * the previous load to the current interval only once, upon the first
+ * wake-up from idle.
*/
- bool copy_prev_load;
+ unsigned int prev_load;
struct cpufreq_policy *cur_policy;
struct delayed_work work;
/*
--
2.0.0.rc2
'copy_prev_load' was recently added by commit: 18b46ab (cpufreq: governor: Be
friendly towards latency-sensitive bursty workloads).
It actually is a bit redundant as we also have 'prev_load' which can store any
integer value and can be used instead of 'copy_prev_load' by setting it to zero
when we don't want to use previous load.
So, drop 'copy_prev_load' and use 'prev_load' instead.
Update comments as well to make it more clear.
There is another change here which was probably missed by Srivatsa during the
last version of updates he made. The unlikely in the 'if' statement was covering
only half of the condition and the whole line should actually come under it.
Also checkpatch is made more silent as it was reporting this (--strict option):
CHECK: Alignment should match open parenthesis
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/cpufreq/cpufreq_governor.c | 13 ++++++++-----
drivers/cpufreq/cpufreq_governor.h | 8 ++++----
2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 9004450..a1ad804 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -132,14 +132,18 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
* an unusually large 'wall_time' (as compared to the sampling
* rate) indicates this scenario.
*/
- if (unlikely(wall_time > (2 * sampling_rate)) &&
- j_cdbs->copy_prev_load) {
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
load = j_cdbs->prev_load;
- j_cdbs->copy_prev_load = false;
+
+ /*
+ * Ensure that we copy the previous load only once, upon
+ * the first wake-up from idle.
+ */
+ j_cdbs->prev_load = 0;
} else {
load = 100 * (wall_time - idle_time) / wall_time;
j_cdbs->prev_load = load;
- j_cdbs->copy_prev_load = true;
}
if (load > max_load)
@@ -373,7 +377,6 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
(j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle);
j_cdbs->prev_load = 100 * prev_load /
(unsigned int) j_cdbs->prev_cpu_wall;
- j_cdbs->copy_prev_load = true;
if (ignore_nice)
j_cdbs->prev_cpu_nice =
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index c2a5b7e..d3082ee 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -134,12 +134,12 @@ struct cpu_dbs_common_info {
u64 prev_cpu_idle;
u64 prev_cpu_wall;
u64 prev_cpu_nice;
- unsigned int prev_load;
/*
- * Flag to ensure that we copy the previous load only once, upon the
- * first wake-up from idle.
+ * Used to store system load before going into idle, when set to zero:
+ * used as a flag to ensure that we copy the previous load only once,
+ * upon the first wake-up from idle.
*/
- bool copy_prev_load;
+ unsigned int prev_load;
struct cpufreq_policy *cur_policy;
struct delayed_work work;
/*
--
2.0.0.rc2
Hi,
Here goes the fifth version and its very much light weight compared to earlier
versions. I hope I haven't missed any review comments here :)
More or less all patches are updated to get rid of all unrelated changes, like
removing unsupported modes OR not disabling events for default cases..
V4->V5: don't change behavior of 'default case' apart from returning error, it
must behave exactly in the same way as it was doing earlier.
Here goes the mainline version of cover-letter:
A clockevent device should be stopped, or its events should be masked, if next
event is expected at KTIME_MAX, i.e. no events are required for very long time.
This would normally happen with NO_HZ (both NO_HZ_IDLE and NO_HZ_FULL) when
tick-sched timer is removed and we don't have any more timers scheduled for
long.
If we don't STOP clockevent device, we are guaranteed to receive at least one
spurious interrupt, as hrtimer_force_reprogram() isn't reprogramming the event
device (on expires == KTIME_MAX). Depending on particular implementation of
clockevent device, this can be a fake interrupt at tick-rate.. (When driver is
emulating ONESHOT over PERIODIC mode. This was observed on at least one
implementation: arm_arch_timer.c).
A simple (yet hacky) solution to get this fixed could be: update
hrtimer_force_reprogram() to always reprogram clockevent device and update
clockevent drivers to STOP generating events (or delay it to max time), when
'expires' is set to KTIME_MAX. But the drawback here is that every clockevent
driver has to be hacked for this particular case and its very easy for new ones
to miss this.
However, Thomas suggested to add an optional mode: ONESHOT_STOPPED
(lkml.org/lkml/2014/5/9/508) to solve this problem.
With this proposal, introducing a new ONESHOT_STOPPED mode would require the
core to know whether the platform implements this mode so it could be
reprogrammed later.
In order for the core to tell if the mode is implemented, ->set_mode() callback
needs to be able to return success or failure.
To change return type of set_mode(), Thomas suggested to implement another
callback: ->set_dev_mode(), with return type 'int'. We can then convert
clockevent drivers to use this interface instead of existing ->set_mode() and
then finally remove ->set_mode()'s support.
This patchset first adds another callback with return capability,
set_dev_mode(), then it migrates all drivers one by one to it and finally
removes earlier callback set_mode() when it has no more users.
FIXME: There are two issues which *may* be required to fix separately.
1. Few drivers still have 'switch cases' for handling unsupported modes (like:
drivers not supporting ONESHOT have a 'case: ONESHOT' with or without a
WARN/BUG/pr_err/etc). As we now have a WARN() in core, we don't need these in
individual drivers and they can be removed.
2. Few clockevent drivers have disabled clock events as soon as we enter their
->set_dev_mode() callbacks and so they stay disabled for 'default case' as
well. We *may* need to fix these drivers in case we don't want them to
disable events in 'default case'. After this series we will never hit
'default case' as we are handling all cases separately, but it might be
required to fix them before ONESHOT_STOPPED series gets in.
Viresh Kumar (8):
clockevents: add ->set_dev_mode() to struct clock_event_device
clockevents: arm: migrate to ->set_dev_mode()
clockevents: mips: migrate to ->set_dev_mode()
clockevents: sparc: migrate to ->set_dev_mode()
clockevents: x86: migrate to ->set_dev_mode()
clockevents: drivers/: migrate to ->set_dev_mode()
clockevents: misc: migrate to ->set_dev_mode()
clockevents: remove ->set_mode() from struct clock_event_device
arch/alpha/kernel/time.c | 31 ++++++++++++++++++---
arch/arc/kernel/time.c | 11 +++++---
arch/arm/common/timer-sp.c | 10 ++++---
arch/arm/kernel/smp_twd.c | 11 +++++---
arch/arm/mach-at91/at91rm9200_time.c | 9 +++++--
arch/arm/mach-at91/at91sam926x_time.c | 8 ++++--
arch/arm/mach-clps711x/common.c | 7 +++--
arch/arm/mach-cns3xxx/core.c | 12 ++++++---
arch/arm/mach-davinci/time.c | 7 +++--
arch/arm/mach-footbridge/dc21285-timer.c | 7 +++--
arch/arm/mach-gemini/time.c | 7 ++---
arch/arm/mach-imx/epit.c | 7 +++--
arch/arm/mach-imx/time.c | 7 +++--
arch/arm/mach-integrator/integrator_ap.c | 9 ++++---
arch/arm/mach-ixp4xx/common.c | 9 ++++---
arch/arm/mach-ks8695/time.c | 22 ++++++++++-----
arch/arm/mach-lpc32xx/timer.c | 7 +++--
arch/arm/mach-mmp/time.c | 11 +++++---
arch/arm/mach-netx/time.c | 9 ++++---
arch/arm/mach-omap1/time.c | 7 +++--
arch/arm/mach-omap1/timer32k.c | 7 +++--
arch/arm/mach-omap2/timer.c | 7 +++--
arch/arm/mach-pxa/time.c | 7 +++--
arch/arm/mach-sa1100/time.c | 7 +++--
arch/arm/mach-spear/time.c | 10 +++----
arch/arm/mach-w90x900/time.c | 8 ++++--
arch/arm/plat-iop/time.c | 9 ++++---
arch/arm/plat-orion/time.c | 20 ++++++++++----
arch/avr32/kernel/time.c | 7 ++---
arch/blackfin/kernel/time-ts.c | 14 +++++++---
arch/c6x/platforms/timer64.c | 7 +++--
arch/hexagon/kernel/time.c | 11 +++++---
arch/m68k/platform/coldfire/pit.c | 7 +++--
arch/microblaze/kernel/timer.c | 7 +++--
arch/mips/alchemy/common/time.c | 14 ++++++++--
arch/mips/include/asm/cevt-r4k.h | 2 +-
arch/mips/jazz/irq.c | 14 ++++++++--
arch/mips/jz4740/time.c | 9 ++++---
arch/mips/kernel/cevt-bcm1480.c | 9 ++++---
arch/mips/kernel/cevt-ds1287.c | 10 +++++--
arch/mips/kernel/cevt-gic.c | 14 ++++++++--
arch/mips/kernel/cevt-gt641xx.c | 12 ++++++---
arch/mips/kernel/cevt-r4k.c | 14 ++++++++--
arch/mips/kernel/cevt-sb1250.c | 9 ++++---
arch/mips/kernel/cevt-smtc.c | 2 +-
arch/mips/kernel/cevt-txx9.c | 7 +++--
arch/mips/loongson/common/cs5536/cs5536_mfgpt.c | 9 +++++--
arch/mips/ralink/cevt-rt3352.c | 12 ++++++---
arch/mips/sgi-ip27/ip27-timer.c | 14 ++++++++--
arch/mips/sni/time.c | 7 +++--
arch/mn10300/kernel/cevt-mn10300.c | 14 ++++++++--
arch/openrisc/kernel/time.c | 7 +++--
arch/powerpc/kernel/time.c | 20 +++++++++++---
arch/s390/kernel/time.c | 14 ++++++++--
arch/score/kernel/time.c | 7 ++---
arch/sh/kernel/localtimer.c | 15 +++++++++--
arch/sparc/kernel/time_32.c | 20 +++++++++-----
arch/sparc/kernel/time_64.c | 7 +++--
arch/tile/kernel/time.c | 15 +++++++++--
arch/um/kernel/time.c | 7 +++--
arch/unicore32/kernel/time.c | 7 +++--
arch/x86/kernel/apic/apic.c | 10 ++++---
arch/x86/kernel/hpet.c | 19 +++++++------
arch/x86/lguest/boot.c | 7 +++--
arch/x86/platform/uv/uv_time.c | 9 ++++---
arch/x86/xen/time.c | 14 +++++++---
arch/xtensa/kernel/time.c | 9 ++++---
drivers/clocksource/arm_arch_timer.c | 36 ++++++++++++++-----------
drivers/clocksource/arm_global_timer.c | 9 ++++---
drivers/clocksource/bcm2835_timer.c | 8 +++---
drivers/clocksource/bcm_kona_timer.c | 12 ++++++---
drivers/clocksource/cadence_ttc_timer.c | 7 +++--
drivers/clocksource/cs5535-clockevt.c | 17 +++++++++---
drivers/clocksource/dummy_timer.c | 15 +++++++++--
drivers/clocksource/dw_apb_timer.c | 7 +++--
drivers/clocksource/em_sti.c | 11 +++++---
drivers/clocksource/exynos_mct.c | 16 +++++++----
drivers/clocksource/i8253.c | 8 ++++--
drivers/clocksource/metag_generic.c | 7 +++--
drivers/clocksource/moxart_timer.c | 10 ++++---
drivers/clocksource/mxs_timer.c | 7 +++--
drivers/clocksource/nomadik-mtu.c | 7 +++--
drivers/clocksource/qcom-timer.c | 10 ++++---
drivers/clocksource/samsung_pwm_timer.c | 7 +++--
drivers/clocksource/sh_cmt.c | 9 ++++---
drivers/clocksource/sh_mtu2.c | 9 ++++---
drivers/clocksource/sh_tmu.c | 9 ++++---
drivers/clocksource/sun4i_timer.c | 11 +++++---
drivers/clocksource/tcb_clksrc.c | 11 +++++---
drivers/clocksource/tegra20_timer.c | 7 +++--
drivers/clocksource/time-armada-370-xp.c | 20 ++++++++++----
drivers/clocksource/time-efm32.c | 7 +++--
drivers/clocksource/time-orion.c | 19 ++++++++++---
drivers/clocksource/timer-keystone.c | 9 ++++---
drivers/clocksource/timer-marco.c | 12 ++++++---
drivers/clocksource/timer-prima2.c | 7 +++--
drivers/clocksource/timer-sun5i.c | 11 +++++---
drivers/clocksource/timer-u300.c | 7 +++--
drivers/clocksource/vf_pit_timer.c | 12 ++++++---
drivers/clocksource/vt8500_timer.c | 7 +++--
drivers/clocksource/zevio-timer.c | 8 +++---
include/linux/clockchips.h | 4 +--
kernel/time/clockevents.c | 7 +++--
kernel/time/tick-broadcast-hrtimer.c | 11 +++++---
kernel/time/timer_list.c | 2 +-
105 files changed, 783 insertions(+), 305 deletions(-)
--
2.0.0.rc2
From: Viresh Kumar <viresh.kumar(a)linaro.org>
3.12-stable review patch. If anyone has any objections, please let me know.
===============
commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.
switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.
But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.
Update expiry time before calling switch_hrtimer_base().
[ tglx: Rewrote changelog once again ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: linaro-networking(a)linaro.org
Cc: fweisbec(a)gmail.com
Cc: arvind.chauhan(a)arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.139988225…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
---
kernel/hrtimer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6de65d8a70e2..aa149222cd8e 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1002,11 +1002,8 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
/* Remove an active timer from the queue: */
ret = remove_hrtimer(timer, base);
- /* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-
if (mode & HRTIMER_MODE_REL) {
- tim = ktime_add_safe(tim, new_base->get_time());
+ tim = ktime_add_safe(tim, base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
* to signal that they simply return xtime in
@@ -1021,6 +1018,9 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ /* Switch the timer base, if necessary: */
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
timer_stats_hrtimer_set_start_info(timer);
leftmost = enqueue_hrtimer(timer, new_base);
--
1.9.3
3.2.60-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar <viresh.kumar(a)linaro.org>
commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.
switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.
But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.
Update expiry time before calling switch_hrtimer_base().
[ tglx: Rewrote changelog once again ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: linaro-networking(a)linaro.org
Cc: fweisbec(a)gmail.com
Cc: arvind.chauhan(a)arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.139988225…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
---
kernel/hrtimer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -980,11 +980,8 @@ int __hrtimer_start_range_ns(struct hrti
/* Remove an active timer from the queue: */
ret = remove_hrtimer(timer, base);
- /* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-
if (mode & HRTIMER_MODE_REL) {
- tim = ktime_add_safe(tim, new_base->get_time());
+ tim = ktime_add_safe(tim, base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
* to signal that they simply return xtime in
@@ -999,6 +996,9 @@ int __hrtimer_start_range_ns(struct hrti
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ /* Switch the timer base, if necessary: */
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
timer_stats_hrtimer_set_start_info(timer);
leftmost = enqueue_hrtimer(timer, new_base);
Drivers expecting CPU's OPPs from device tree initialize OPP table themselves by
calling of_init_opp_table() and there is nothing driver specific in that. They
all do it in the same redundant way.
It would be better if we can get rid of redundancy by initializing CPU OPPs from
CPU core code for all CPUs (that have a "operating-points" property defined in
their node).
This patchset callsl of_init_opp_table() directly from register_cpu() right
after CPU device is registered. Later patches updates existing cpufreq drivers
which were calling of_init_opp_table() directly.
V3->V4:
- Remove of_init_cpu_opp_table() and use of_init_opp_table() instead after
adding more print messages to it.
- Drop a rather unrelated patch which was getting rid of dependency on thermal
for cpufreq-cpu0. Will be sent separately.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Amit Daniel Kachhap <amit.daniel(a)samsung.com>
Cc: Kukjin Kim <kgene.kim(a)samsung.com>
Cc: Shawn Guo <shawn.guo(a)linaro.org>
Cc: Sudeep Holla <sudeep.holla(a)arm.com>
Viresh Kumar (8):
opp: of_init_opp_table(): return -ENOSYS when feature isn't
implemented
opp: call of_node_{get|put}() from of_init_opp_table()
opp: Enhance debug messages in of_init_opp_table()
driver/core: cpu: initialize opp table
cpufreq: arm_big_little: don't initialize opp table
cpufreq: imx6q: don't initialize opp table
cpufreq: cpufreq-cpu0: don't initialize opp table
cpufreq: exynos5440: don't initialize opp table
arch/arm/mach-imx/mach-imx6q.c | 36 ++++++++----------------------------
drivers/base/cpu.c | 11 +++++++----
drivers/base/power/opp.c | 11 ++++++++++-
drivers/cpufreq/arm_big_little.c | 12 +++++++-----
drivers/cpufreq/arm_big_little_dt.c | 18 ------------------
drivers/cpufreq/cpufreq-cpu0.c | 6 ------
drivers/cpufreq/exynos5440-cpufreq.c | 6 ------
drivers/cpufreq/imx6q-cpufreq.c | 20 +-------------------
include/linux/pm_opp.h | 2 +-
9 files changed, 34 insertions(+), 88 deletions(-)
--
2.0.0.rc2
From: Viresh Kumar <viresh.kumar(a)linaro.org>
This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.
===============
commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.
switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.
But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.
Update expiry time before calling switch_hrtimer_base().
[ tglx: Rewrote changelog once again ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: linaro-networking(a)linaro.org
Cc: fweisbec(a)gmail.com
Cc: arvind.chauhan(a)arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.139988225…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
---
kernel/hrtimer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6de65d8a70e2..aa149222cd8e 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1002,11 +1002,8 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
/* Remove an active timer from the queue: */
ret = remove_hrtimer(timer, base);
- /* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-
if (mode & HRTIMER_MODE_REL) {
- tim = ktime_add_safe(tim, new_base->get_time());
+ tim = ktime_add_safe(tim, base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
* to signal that they simply return xtime in
@@ -1021,6 +1018,9 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ /* Switch the timer base, if necessary: */
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
timer_stats_hrtimer_set_start_info(timer);
leftmost = enqueue_hrtimer(timer, new_base);
--
1.9.3
Encapsulate the large portion of cpuidle_idle_call inside another
function so when CONFIG_CPU_IDLE=n, the code will be compiled out.
Also that is benefitial for the clarity of the code as it removes
a nested indentation level.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
---
kernel/sched/idle.c | 161 +++++++++++++++++++++++++++------------------------
1 file changed, 86 insertions(+), 75 deletions(-)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index b8cd302..f2f4bc9 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -63,6 +63,90 @@ void __weak arch_cpu_idle(void)
local_irq_enable();
}
+#ifdef CONFIG_CPU_IDLE
+static int __cpuidle_idle_call(void)
+{
+ struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
+ struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
+ int next_state, entered_state, ret;
+ bool broadcast;
+
+ /*
+ * Check if the cpuidle framework is ready, otherwise fallback
+ * to the default arch specific idle method
+ */
+ ret = cpuidle_enabled(drv, dev);
+ if (ret)
+ return ret;
+
+ /*
+ * Ask the governor to choose an idle state it thinks
+ * it is convenient to go to. There is *always* a
+ * convenient idle state
+ */
+ next_state = cpuidle_select(drv, dev);
+
+ /*
+ * The idle task must be scheduled, it is pointless to
+ * go to idle, just update no idle residency and get
+ * out of this function
+ */
+ if (current_clr_polling_and_test()) {
+ dev->last_residency = 0;
+ entered_state = next_state;
+ local_irq_enable();
+ } else {
+ broadcast = !!(drv->states[next_state].flags &
+ CPUIDLE_FLAG_TIMER_STOP);
+
+ if (broadcast)
+ /*
+ * Tell the time framework to switch to a
+ * broadcast timer because our local timer
+ * will be shutdown. If a local timer is used
+ * from another cpu as a broadcast timer, this
+ * call may fail if it is not available
+ */
+ ret = clockevents_notify(
+ CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
+ &dev->cpu);
+
+ if (!ret) {
+ trace_cpu_idle_rcuidle(next_state, dev->cpu);
+
+ /*
+ * Enter the idle state previously returned by
+ * the governor decision. This function will
+ * block until an interrupt occurs and will
+ * take care of re-enabling the local
+ * interrupts
+ */
+ entered_state = cpuidle_enter(drv, dev, next_state);
+
+ trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
+
+ if (broadcast)
+ clockevents_notify(
+ CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
+ &dev->cpu);
+
+ /*
+ * Give the governor an opportunity to reflect
+ * on the outcome
+ */
+ cpuidle_reflect(dev, entered_state);
+ }
+ }
+
+ return 0;
+}
+#else
+static int inline __cpuidle_idle_call(void)
+{
+ return -ENOSYS;
+}
+#endif
+
/**
* cpuidle_idle_call - the main idle function
*
@@ -70,10 +154,7 @@ void __weak arch_cpu_idle(void)
*/
static void cpuidle_idle_call(void)
{
- struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
- struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
- int next_state, entered_state, ret;
- bool broadcast;
+ int ret;
/*
* Check if the idle task must be rescheduled. If it is the
@@ -100,80 +181,10 @@ static void cpuidle_idle_call(void)
rcu_idle_enter();
/*
- * Check if the cpuidle framework is ready, otherwise fallback
- * to the default arch specific idle method
- */
- ret = cpuidle_enabled(drv, dev);
-
- if (!ret) {
- /*
- * Ask the governor to choose an idle state it thinks
- * it is convenient to go to. There is *always* a
- * convenient idle state
- */
- next_state = cpuidle_select(drv, dev);
-
- /*
- * The idle task must be scheduled, it is pointless to
- * go to idle, just update no idle residency and get
- * out of this function
- */
- if (current_clr_polling_and_test()) {
- dev->last_residency = 0;
- entered_state = next_state;
- local_irq_enable();
- } else {
- broadcast = !!(drv->states[next_state].flags &
- CPUIDLE_FLAG_TIMER_STOP);
-
- if (broadcast)
- /*
- * Tell the time framework to switch
- * to a broadcast timer because our
- * local timer will be shutdown. If a
- * local timer is used from another
- * cpu as a broadcast timer, this call
- * may fail if it is not available
- */
- ret = clockevents_notify(
- CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
- &dev->cpu);
-
- if (!ret) {
- trace_cpu_idle_rcuidle(next_state, dev->cpu);
-
- /*
- * Enter the idle state previously
- * returned by the governor
- * decision. This function will block
- * until an interrupt occurs and will
- * take care of re-enabling the local
- * interrupts
- */
- entered_state = cpuidle_enter(drv, dev,
- next_state);
-
- trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
- dev->cpu);
-
- if (broadcast)
- clockevents_notify(
- CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
- &dev->cpu);
-
- /*
- * Give the governor an opportunity to reflect on the
- * outcome
- */
- cpuidle_reflect(dev, entered_state);
- }
- }
- }
-
- /*
* We can't use the cpuidle framework, let's use the default
* idle routine
*/
+ ret = __cpuidle_idle_call();
if (ret)
arch_cpu_idle();
--
1.7.9.5
Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.
While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.
For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.
To get this fixed in a generic way, lets introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().
This patchset also update Tegra to use this new infrastructure and is already
tested by Stephen.
V4->V5:
- Moved setting old frequency to __target_index() from __target_intermediate()
- Replaced retval with 0 during call to cpufreq_freq_transition_end() for
restoring to restored freq.
- Fix important issues with Tegra driver as reported by Stephen.
- Dropped patch 1, already applied: "cpufreq: handle calls to ->target_index()
in separate routine"
V3->V4:
- Allow get_intermediate() to return zero when we don't need to switch to
intermediate first
- Get rid of 'goto' and create another routine for handling intermediate freqs
- Allow target_index() to fail, its not a crime :)
- Fix tegra driver to return zero from get_intermediate() for few situations
(refer to patch 3/4)
- Fix issues with tegra's patch, like s/rate/rate * 1000
- Overall there are more modifications that what Doug requested as I felt we
need better support from core.
- Looks much better now, thanks Doug :)
V2-V3:
- Fix spelling error: s/Uset/Used
- Update tegra with the changes Stephen suggested
- Include a dependency patch sent separately earlier (3/4)
V1-V2: Almost changed completely, V1 was here: https://lkml.org/lkml/2014/5/15/40
Viresh Kumar (2):
cpufreq: add support for intermediate (stable) frequencies
cpufreq: Tegra: implement intermediate frequency callbacks
Documentation/cpu-freq/cpu-drivers.txt | 29 +++++++++-
drivers/cpufreq/cpufreq.c | 67 ++++++++++++++++++++---
drivers/cpufreq/tegra-cpufreq.c | 97 ++++++++++++++++++++++------------
include/linux/cpufreq.h | 25 +++++++++
4 files changed, 174 insertions(+), 44 deletions(-)
--
2.0.0.rc2
Hi Alexandru,
No, I'm afraid not that stable but this one:
https://git.linaro.org/kernel/linux-linaro-tracking.git
There is a wiki page for that Linaro Kernel tree process:
https://wiki.linaro.org/Platform/DevPlatform/LinuxLinaroKernelTreeProcess
I added Andrey Konovalov to CC list since he currently is the maintainer of
linux-linaro kernel and hopefully can provide you more information.
Thanks.
On 6 June 2014 00:17, Alexandru Rosca <srosca(a)bu.edu> wrote:
> Hi Botao,
>
> Thanks for your help. I am using Ubuntu trusty 14.05. When I type uname -a
> I get Linux linaro 3.15.0-1 arndale. However the latest version on that
> link seems to be 3.1. Should I use the one in linux-linaro-stable.git?
>
> Cheers
>
>
> On Thu, Jun 5, 2014 at 10:11 AM, Botao Sun <botao.sun(a)linaro.org> wrote:
>
>> Hi Alexandru,
>>
>> Here is the Linaro Kernel git repository list:
>>
>> https://git.linaro.org/?a=project_list;pf=kernel
>>
>> And which platform are you working on? Android or ubuntu? I ask because
>> the Kernel version on Android is the old one: 3.9.1
>>
>> FYI.
>>
>>
>>
>> On 5 June 2014 23:56, Alexandru Rosca <srosca(a)bu.edu> wrote:
>>
>>> Hi Botao,
>>>
>>> The instructions online for building and deploying a linux kernel seems
>>> to be for version 3.1. I am using the latest arndale OS from the linaro
>>> releases and would like to know which source code you suggest I compile
>>> with the config file.
>>>
>>> Cheers,
>>> Alex
>>>
>>>
>>> On Thu, Jun 5, 2014 at 9:47 AM, Botao Sun <botao.sun(a)linaro.org> wrote:
>>>
>>>> + Linaro Kernel and Samsung Landing Team
>>>>
>>>> Hi Alexandru,
>>>>
>>>> As I know for the Linux mainline Kernel, we can run make menuconfig
>>>> then search the keyword by typing "/" and USB_SERIAL.
>>>>
>>>> Since we already have Linaro Samsung Arndale Kernel config file, then
>>>> it would be better to use this file and modify USB_SERIAL related items
>>>> later:
>>>>
>>>>
>>>> https://www.kernel.org/doc/index-old.html#Using_an_existing_configuration
>>>>
>>>> FYI.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> Best Regards
>>>> Botao Sun
>>>>
>>>>
>>>> On 5 June 2014 07:11, Alexandru Rosca <srosca(a)bu.edu> wrote:
>>>>
>>>>> Dear Botao Sun,
>>>>>
>>>>> I have an arndale board for which I would like to communicate with
>>>>> serially via an FTDI and some microcontrollers. It seems that the linaro
>>>>> kernel does not support this. I was thinking of compiling a kernel with the
>>>>> USB SERIAL config option set. Could you advise me on how to do this? Can I
>>>>> just use the config file for the arndale and modify it for compilation?
>>>>>
>>>>> Thank you,
>>>>> Sasha
>>>>>
>>>>
>>>>
>>>
>>
>
When a timer is enqueued or modified on a NO_HZ_FULL target (local or remote),
the target is expected to re-evaluate its timer wheel to decide if tick must be
restarted to handle timer expirations.
If it doesn't re-evaluate timer wheel and restart tick, it wouldn't be able to
call timer's handler on its expiration. It would be delayed until the time tick
is restarted again. Currently the max delay can be 1 second as returned by
scheduler_tick_max_deferment(), but it can increase in future.
To handle this, currently we are calling wake_up_nohz_cpu() from add_timer_on()
but what about timers enqueued/modified with other APIs?
For example, in __mod_timer() we get target cpu (where the timer should
get enqueued) by calling get_nohz_timer_target() and it is free to return a
NO_HZ_FULL cpu as well. So, we *should* re-evaluate timer wheel there as well,
otherwise call to timer's handler might be delayed as explained earlier.
In order to fix this issue we can move wake_up_nohz_cpu(cpu) to
internal_add_timer() so that it is well handled for any add-timer API.
LKML discussion about this: https://lkml.org/lkml/2014/6/4/169
This requires internal_add_timer() to get cpu number from per-cpu object 'base',
as all the callers might not have cpu number to pass to internal_add_timer().
For example, in __mod_timer() we find timer's base from 'timer' pointer and not
from per-cpu arithmetic.
Thus, this patch adds another field 'cpu' in 'struct tvec_base' which will store
cpu number of the cpu it belongs to.
Next patch will then move wake_up_nohz_cpu() to internal_add_timer().
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
kernel/timer.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/timer.c b/kernel/timer.c
index 3bb01a3..9e5f4f2 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -82,6 +82,7 @@ struct tvec_base {
unsigned long next_timer;
unsigned long active_timers;
unsigned long all_timers;
+ int cpu;
struct tvec_root tv1;
struct tvec tv2;
struct tvec tv3;
@@ -1568,6 +1569,7 @@ static int init_timers_cpu(int cpu)
}
spin_lock_init(&base->lock);
tvec_base_done[cpu] = 1;
+ base->cpu = cpu;
} else {
base = per_cpu(tvec_bases, cpu);
}
--
2.0.0.rc2
Hi Alexandru,
Here is the Linaro Kernel git repository list:
https://git.linaro.org/?a=project_list;pf=kernel
And which platform are you working on? Android or ubuntu? I ask because the
Kernel version on Android is the old one: 3.9.1
FYI.
On 5 June 2014 23:56, Alexandru Rosca <srosca(a)bu.edu> wrote:
> Hi Botao,
>
> The instructions online for building and deploying a linux kernel seems to
> be for version 3.1. I am using the latest arndale OS from the linaro
> releases and would like to know which source code you suggest I compile
> with the config file.
>
> Cheers,
> Alex
>
>
> On Thu, Jun 5, 2014 at 9:47 AM, Botao Sun <botao.sun(a)linaro.org> wrote:
>
>> + Linaro Kernel and Samsung Landing Team
>>
>> Hi Alexandru,
>>
>> As I know for the Linux mainline Kernel, we can run make menuconfig then
>> search the keyword by typing "/" and USB_SERIAL.
>>
>> Since we already have Linaro Samsung Arndale Kernel config file, then it
>> would be better to use this file and modify USB_SERIAL related items later:
>>
>> https://www.kernel.org/doc/index-old.html#Using_an_existing_configuration
>>
>> FYI.
>>
>> Thanks.
>>
>>
>> Best Regards
>> Botao Sun
>>
>>
>> On 5 June 2014 07:11, Alexandru Rosca <srosca(a)bu.edu> wrote:
>>
>>> Dear Botao Sun,
>>>
>>> I have an arndale board for which I would like to communicate with
>>> serially via an FTDI and some microcontrollers. It seems that the linaro
>>> kernel does not support this. I was thinking of compiling a kernel with the
>>> USB SERIAL config option set. Could you advise me on how to do this? Can I
>>> just use the config file for the arndale and modify it for compilation?
>>>
>>> Thank you,
>>> Sasha
>>>
>>
>>
>
+ Linaro Kernel and Samsung Landing Team
Hi Alexandru,
As I know for the Linux mainline Kernel, we can run make menuconfig then
search the keyword by typing "/" and USB_SERIAL.
Since we already have Linaro Samsung Arndale Kernel config file, then it
would be better to use this file and modify USB_SERIAL related items later:
https://www.kernel.org/doc/index-old.html#Using_an_existing_configuration
FYI.
Thanks.
Best Regards
Botao Sun
On 5 June 2014 07:11, Alexandru Rosca <srosca(a)bu.edu> wrote:
> Dear Botao Sun,
>
> I have an arndale board for which I would like to communicate with
> serially via an FTDI and some microcontrollers. It seems that the linaro
> kernel does not support this. I was thinking of compiling a kernel with the
> USB SERIAL config option set. Could you advise me on how to do this? Can I
> just use the config file for the arndale and modify it for compilation?
>
> Thank you,
> Sasha
>
From: Mark Brown <broonie(a)linaro.org>
Some of the generic drivers used on ARM class systems use OPP so allow it
to be selected in Kconfig. No code is required for this, it is not clear
to me why there is config for this option.
Signed-off-by: Mark Brown <broonie(a)linaro.org>
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8ccdd2646ae3..8256d6d09d33 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2,6 +2,7 @@ config ARM64
def_bool y
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_USE_CMPXCHG_LOCKREF
+ select ARCH_HAS_OPP
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_WANT_OPTIONAL_GPIOLIB
--
2.0.0.rc2
"Power" is a very bad term in the scheduler context. There are so many
meanings that can be attached to it. And with the upcoming "power
aware" scheduler work, confusion is sure to happen.
The definition of "power" is typically the rate at which work is performed,
energy is converted or electric energy is transferred. The notion of
"compute capacity" is rather at odds with "power" to the point many
comments in the code have to make it explicit that "capacity" is the
actual intended meaning.
So let's make it clear what we man by using "capacity" in place of "power"
directly in the code. That will make the introduction of actual "power
consumption" concepts much clearer later on.
This is based on the latest tip tree to apply correctly on top of existing
scheduler changes already queued there.
Changes from v1:
- capa_factor and SCHED_CAPA_* changed to be spelled "capacity" in full
to save peterz some Chupacabra nightmares
- some minor corrections in commit logs
- rebased on latest tip tree
arch/arm/kernel/topology.c | 54 +++----
include/linux/sched.h | 8 +-
kernel/sched/core.c | 87 ++++++-----
kernel/sched/fair.c | 323 ++++++++++++++++++++-------------------
kernel/sched/sched.h | 18 +--
5 files changed, 246 insertions(+), 244 deletions(-)
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar <viresh.kumar(a)linaro.org>
commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.
switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.
But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.
Update expiry time before calling switch_hrtimer_base().
[ tglx: Rewrote changelog once again ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: linaro-networking(a)linaro.org
Cc: fweisbec(a)gmail.com
Cc: arvind.chauhan(a)arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.139988225…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/hrtimer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -985,11 +985,8 @@ int __hrtimer_start_range_ns(struct hrti
/* Remove an active timer from the queue: */
ret = remove_hrtimer(timer, base);
- /* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-
if (mode & HRTIMER_MODE_REL) {
- tim = ktime_add_safe(tim, new_base->get_time());
+ tim = ktime_add_safe(tim, base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
* to signal that they simply return xtime in
@@ -1004,6 +1001,9 @@ int __hrtimer_start_range_ns(struct hrti
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ /* Switch the timer base, if necessary: */
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
timer_stats_hrtimer_set_start_info(timer);
leftmost = enqueue_hrtimer(timer, new_base);
3.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar <viresh.kumar(a)linaro.org>
commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.
switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.
But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.
Update expiry time before calling switch_hrtimer_base().
[ tglx: Rewrote changelog once again ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: linaro-networking(a)linaro.org
Cc: fweisbec(a)gmail.com
Cc: arvind.chauhan(a)arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.139988225…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/hrtimer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -999,11 +999,8 @@ int __hrtimer_start_range_ns(struct hrti
/* Remove an active timer from the queue: */
ret = remove_hrtimer(timer, base);
- /* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-
if (mode & HRTIMER_MODE_REL) {
- tim = ktime_add_safe(tim, new_base->get_time());
+ tim = ktime_add_safe(tim, base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
* to signal that they simply return xtime in
@@ -1018,6 +1015,9 @@ int __hrtimer_start_range_ns(struct hrti
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ /* Switch the timer base, if necessary: */
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
timer_stats_hrtimer_set_start_info(timer);
leftmost = enqueue_hrtimer(timer, new_base);
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar <viresh.kumar(a)linaro.org>
commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.
switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.
But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.
Update expiry time before calling switch_hrtimer_base().
[ tglx: Rewrote changelog once again ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: linaro-networking(a)linaro.org
Cc: fweisbec(a)gmail.com
Cc: arvind.chauhan(a)arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.139988225…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/hrtimer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1003,11 +1003,8 @@ int __hrtimer_start_range_ns(struct hrti
/* Remove an active timer from the queue: */
ret = remove_hrtimer(timer, base);
- /* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-
if (mode & HRTIMER_MODE_REL) {
- tim = ktime_add_safe(tim, new_base->get_time());
+ tim = ktime_add_safe(tim, base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
* to signal that they simply return xtime in
@@ -1022,6 +1019,9 @@ int __hrtimer_start_range_ns(struct hrti
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ /* Switch the timer base, if necessary: */
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
timer_stats_hrtimer_set_start_info(timer);
leftmost = enqueue_hrtimer(timer, new_base);