Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in frequency table.
Sachin recently found this problem with cpufreq-cpu0 driver when he was testing
it for Exynos.
Set this flag for cpufreq-cpu0 driver.
Reported-and-tested-by: Sachin Kamat <sachin.kamat(a)linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/cpufreq/cpufreq-cpu0.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
index 09b9129..ee1ae30 100644
--- a/drivers/cpufreq/cpufreq-cpu0.c
+++ b/drivers/cpufreq/cpufreq-cpu0.c
@@ -104,7 +104,7 @@ static int cpu0_cpufreq_init(struct cpufreq_policy *policy)
}
static struct cpufreq_driver cpu0_cpufreq_driver = {
- .flags = CPUFREQ_STICKY,
+ .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK,
.verify = cpufreq_generic_frequency_table_verify,
.target_index = cpu0_set_target,
.get = cpufreq_generic_get,
--
2.0.0.rc2
'copy_prev_load' was recently added by commit: 18b46ab (cpufreq: governor: Be
friendly towards latency-sensitive bursty workloads).
It actually is a bit redundant as we also have 'prev_load' which can store any
integer value and can be used instead of 'copy_prev_load' by setting it zero.
True load can also turn out to be zero during long idle intervals (and hence the
actual value of 'prev_load' and the overloaded value can clash). However this is
not a problem because, if the true load was really zero in the previous
interval, it makes sense to evaluate the load afresh for the current interval
rather than copying the previous load.
So, drop 'copy_prev_load' and use 'prev_load' instead.
Update comments as well to make it more clear.
There is another change here which was probably missed by Srivatsa during the
last version of updates he made. The unlikely in the 'if' statement was covering
only half of the condition and the whole line should actually come under it.
Also checkpatch is made more silent as it was reporting this (--strict option):
CHECK: Alignment should match open parenthesis
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
Resend: Updated comments/logs as suggested by Srivatsa.
drivers/cpufreq/cpufreq_governor.c | 19 ++++++++++++++-----
drivers/cpufreq/cpufreq_governor.h | 9 +++++----
2 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 9004450..1b44496 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -131,15 +131,25 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
* timer would not have fired during CPU-idle periods. Hence
* an unusually large 'wall_time' (as compared to the sampling
* rate) indicates this scenario.
+ *
+ * prev_load can be zero in two cases and we must recalculate it
+ * for both cases:
+ * - during long idle intervals
+ * - explicitly set to zero
*/
- if (unlikely(wall_time > (2 * sampling_rate)) &&
- j_cdbs->copy_prev_load) {
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
load = j_cdbs->prev_load;
- j_cdbs->copy_prev_load = false;
+
+ /*
+ * Perform a destructive copy, to ensure that we copy
+ * the previous load only once, upon the first wake-up
+ * from idle.
+ */
+ j_cdbs->prev_load = 0;
} else {
load = 100 * (wall_time - idle_time) / wall_time;
j_cdbs->prev_load = load;
- j_cdbs->copy_prev_load = true;
}
if (load > max_load)
@@ -373,7 +383,6 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
(j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle);
j_cdbs->prev_load = 100 * prev_load /
(unsigned int) j_cdbs->prev_cpu_wall;
- j_cdbs->copy_prev_load = true;
if (ignore_nice)
j_cdbs->prev_cpu_nice =
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index c2a5b7e..cc401d1 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -134,12 +134,13 @@ struct cpu_dbs_common_info {
u64 prev_cpu_idle;
u64 prev_cpu_wall;
u64 prev_cpu_nice;
- unsigned int prev_load;
/*
- * Flag to ensure that we copy the previous load only once, upon the
- * first wake-up from idle.
+ * Used to keep track of load in the previous interval. However, when
+ * explicitly set to zero, it is used as a flag to ensure that we copy
+ * the previous load to the current interval only once, upon the first
+ * wake-up from idle.
*/
- bool copy_prev_load;
+ unsigned int prev_load;
struct cpufreq_policy *cur_policy;
struct delayed_work work;
/*
--
2.0.0.rc2
'copy_prev_load' was recently added by commit: 18b46ab (cpufreq: governor: Be
friendly towards latency-sensitive bursty workloads).
It actually is a bit redundant as we also have 'prev_load' which can store any
integer value and can be used instead of 'copy_prev_load' by setting it to zero
when we don't want to use previous load.
So, drop 'copy_prev_load' and use 'prev_load' instead.
Update comments as well to make it more clear.
There is another change here which was probably missed by Srivatsa during the
last version of updates he made. The unlikely in the 'if' statement was covering
only half of the condition and the whole line should actually come under it.
Also checkpatch is made more silent as it was reporting this (--strict option):
CHECK: Alignment should match open parenthesis
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/cpufreq/cpufreq_governor.c | 13 ++++++++-----
drivers/cpufreq/cpufreq_governor.h | 8 ++++----
2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 9004450..a1ad804 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -132,14 +132,18 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
* an unusually large 'wall_time' (as compared to the sampling
* rate) indicates this scenario.
*/
- if (unlikely(wall_time > (2 * sampling_rate)) &&
- j_cdbs->copy_prev_load) {
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
load = j_cdbs->prev_load;
- j_cdbs->copy_prev_load = false;
+
+ /*
+ * Ensure that we copy the previous load only once, upon
+ * the first wake-up from idle.
+ */
+ j_cdbs->prev_load = 0;
} else {
load = 100 * (wall_time - idle_time) / wall_time;
j_cdbs->prev_load = load;
- j_cdbs->copy_prev_load = true;
}
if (load > max_load)
@@ -373,7 +377,6 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
(j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle);
j_cdbs->prev_load = 100 * prev_load /
(unsigned int) j_cdbs->prev_cpu_wall;
- j_cdbs->copy_prev_load = true;
if (ignore_nice)
j_cdbs->prev_cpu_nice =
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index c2a5b7e..d3082ee 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -134,12 +134,12 @@ struct cpu_dbs_common_info {
u64 prev_cpu_idle;
u64 prev_cpu_wall;
u64 prev_cpu_nice;
- unsigned int prev_load;
/*
- * Flag to ensure that we copy the previous load only once, upon the
- * first wake-up from idle.
+ * Used to store system load before going into idle, when set to zero:
+ * used as a flag to ensure that we copy the previous load only once,
+ * upon the first wake-up from idle.
*/
- bool copy_prev_load;
+ unsigned int prev_load;
struct cpufreq_policy *cur_policy;
struct delayed_work work;
/*
--
2.0.0.rc2
Hi,
Here goes the fifth version and its very much light weight compared to earlier
versions. I hope I haven't missed any review comments here :)
More or less all patches are updated to get rid of all unrelated changes, like
removing unsupported modes OR not disabling events for default cases..
V4->V5: don't change behavior of 'default case' apart from returning error, it
must behave exactly in the same way as it was doing earlier.
Here goes the mainline version of cover-letter:
A clockevent device should be stopped, or its events should be masked, if next
event is expected at KTIME_MAX, i.e. no events are required for very long time.
This would normally happen with NO_HZ (both NO_HZ_IDLE and NO_HZ_FULL) when
tick-sched timer is removed and we don't have any more timers scheduled for
long.
If we don't STOP clockevent device, we are guaranteed to receive at least one
spurious interrupt, as hrtimer_force_reprogram() isn't reprogramming the event
device (on expires == KTIME_MAX). Depending on particular implementation of
clockevent device, this can be a fake interrupt at tick-rate.. (When driver is
emulating ONESHOT over PERIODIC mode. This was observed on at least one
implementation: arm_arch_timer.c).
A simple (yet hacky) solution to get this fixed could be: update
hrtimer_force_reprogram() to always reprogram clockevent device and update
clockevent drivers to STOP generating events (or delay it to max time), when
'expires' is set to KTIME_MAX. But the drawback here is that every clockevent
driver has to be hacked for this particular case and its very easy for new ones
to miss this.
However, Thomas suggested to add an optional mode: ONESHOT_STOPPED
(lkml.org/lkml/2014/5/9/508) to solve this problem.
With this proposal, introducing a new ONESHOT_STOPPED mode would require the
core to know whether the platform implements this mode so it could be
reprogrammed later.
In order for the core to tell if the mode is implemented, ->set_mode() callback
needs to be able to return success or failure.
To change return type of set_mode(), Thomas suggested to implement another
callback: ->set_dev_mode(), with return type 'int'. We can then convert
clockevent drivers to use this interface instead of existing ->set_mode() and
then finally remove ->set_mode()'s support.
This patchset first adds another callback with return capability,
set_dev_mode(), then it migrates all drivers one by one to it and finally
removes earlier callback set_mode() when it has no more users.
FIXME: There are two issues which *may* be required to fix separately.
1. Few drivers still have 'switch cases' for handling unsupported modes (like:
drivers not supporting ONESHOT have a 'case: ONESHOT' with or without a
WARN/BUG/pr_err/etc). As we now have a WARN() in core, we don't need these in
individual drivers and they can be removed.
2. Few clockevent drivers have disabled clock events as soon as we enter their
->set_dev_mode() callbacks and so they stay disabled for 'default case' as
well. We *may* need to fix these drivers in case we don't want them to
disable events in 'default case'. After this series we will never hit
'default case' as we are handling all cases separately, but it might be
required to fix them before ONESHOT_STOPPED series gets in.
Viresh Kumar (8):
clockevents: add ->set_dev_mode() to struct clock_event_device
clockevents: arm: migrate to ->set_dev_mode()
clockevents: mips: migrate to ->set_dev_mode()
clockevents: sparc: migrate to ->set_dev_mode()
clockevents: x86: migrate to ->set_dev_mode()
clockevents: drivers/: migrate to ->set_dev_mode()
clockevents: misc: migrate to ->set_dev_mode()
clockevents: remove ->set_mode() from struct clock_event_device
arch/alpha/kernel/time.c | 31 ++++++++++++++++++---
arch/arc/kernel/time.c | 11 +++++---
arch/arm/common/timer-sp.c | 10 ++++---
arch/arm/kernel/smp_twd.c | 11 +++++---
arch/arm/mach-at91/at91rm9200_time.c | 9 +++++--
arch/arm/mach-at91/at91sam926x_time.c | 8 ++++--
arch/arm/mach-clps711x/common.c | 7 +++--
arch/arm/mach-cns3xxx/core.c | 12 ++++++---
arch/arm/mach-davinci/time.c | 7 +++--
arch/arm/mach-footbridge/dc21285-timer.c | 7 +++--
arch/arm/mach-gemini/time.c | 7 ++---
arch/arm/mach-imx/epit.c | 7 +++--
arch/arm/mach-imx/time.c | 7 +++--
arch/arm/mach-integrator/integrator_ap.c | 9 ++++---
arch/arm/mach-ixp4xx/common.c | 9 ++++---
arch/arm/mach-ks8695/time.c | 22 ++++++++++-----
arch/arm/mach-lpc32xx/timer.c | 7 +++--
arch/arm/mach-mmp/time.c | 11 +++++---
arch/arm/mach-netx/time.c | 9 ++++---
arch/arm/mach-omap1/time.c | 7 +++--
arch/arm/mach-omap1/timer32k.c | 7 +++--
arch/arm/mach-omap2/timer.c | 7 +++--
arch/arm/mach-pxa/time.c | 7 +++--
arch/arm/mach-sa1100/time.c | 7 +++--
arch/arm/mach-spear/time.c | 10 +++----
arch/arm/mach-w90x900/time.c | 8 ++++--
arch/arm/plat-iop/time.c | 9 ++++---
arch/arm/plat-orion/time.c | 20 ++++++++++----
arch/avr32/kernel/time.c | 7 ++---
arch/blackfin/kernel/time-ts.c | 14 +++++++---
arch/c6x/platforms/timer64.c | 7 +++--
arch/hexagon/kernel/time.c | 11 +++++---
arch/m68k/platform/coldfire/pit.c | 7 +++--
arch/microblaze/kernel/timer.c | 7 +++--
arch/mips/alchemy/common/time.c | 14 ++++++++--
arch/mips/include/asm/cevt-r4k.h | 2 +-
arch/mips/jazz/irq.c | 14 ++++++++--
arch/mips/jz4740/time.c | 9 ++++---
arch/mips/kernel/cevt-bcm1480.c | 9 ++++---
arch/mips/kernel/cevt-ds1287.c | 10 +++++--
arch/mips/kernel/cevt-gic.c | 14 ++++++++--
arch/mips/kernel/cevt-gt641xx.c | 12 ++++++---
arch/mips/kernel/cevt-r4k.c | 14 ++++++++--
arch/mips/kernel/cevt-sb1250.c | 9 ++++---
arch/mips/kernel/cevt-smtc.c | 2 +-
arch/mips/kernel/cevt-txx9.c | 7 +++--
arch/mips/loongson/common/cs5536/cs5536_mfgpt.c | 9 +++++--
arch/mips/ralink/cevt-rt3352.c | 12 ++++++---
arch/mips/sgi-ip27/ip27-timer.c | 14 ++++++++--
arch/mips/sni/time.c | 7 +++--
arch/mn10300/kernel/cevt-mn10300.c | 14 ++++++++--
arch/openrisc/kernel/time.c | 7 +++--
arch/powerpc/kernel/time.c | 20 +++++++++++---
arch/s390/kernel/time.c | 14 ++++++++--
arch/score/kernel/time.c | 7 ++---
arch/sh/kernel/localtimer.c | 15 +++++++++--
arch/sparc/kernel/time_32.c | 20 +++++++++-----
arch/sparc/kernel/time_64.c | 7 +++--
arch/tile/kernel/time.c | 15 +++++++++--
arch/um/kernel/time.c | 7 +++--
arch/unicore32/kernel/time.c | 7 +++--
arch/x86/kernel/apic/apic.c | 10 ++++---
arch/x86/kernel/hpet.c | 19 +++++++------
arch/x86/lguest/boot.c | 7 +++--
arch/x86/platform/uv/uv_time.c | 9 ++++---
arch/x86/xen/time.c | 14 +++++++---
arch/xtensa/kernel/time.c | 9 ++++---
drivers/clocksource/arm_arch_timer.c | 36 ++++++++++++++-----------
drivers/clocksource/arm_global_timer.c | 9 ++++---
drivers/clocksource/bcm2835_timer.c | 8 +++---
drivers/clocksource/bcm_kona_timer.c | 12 ++++++---
drivers/clocksource/cadence_ttc_timer.c | 7 +++--
drivers/clocksource/cs5535-clockevt.c | 17 +++++++++---
drivers/clocksource/dummy_timer.c | 15 +++++++++--
drivers/clocksource/dw_apb_timer.c | 7 +++--
drivers/clocksource/em_sti.c | 11 +++++---
drivers/clocksource/exynos_mct.c | 16 +++++++----
drivers/clocksource/i8253.c | 8 ++++--
drivers/clocksource/metag_generic.c | 7 +++--
drivers/clocksource/moxart_timer.c | 10 ++++---
drivers/clocksource/mxs_timer.c | 7 +++--
drivers/clocksource/nomadik-mtu.c | 7 +++--
drivers/clocksource/qcom-timer.c | 10 ++++---
drivers/clocksource/samsung_pwm_timer.c | 7 +++--
drivers/clocksource/sh_cmt.c | 9 ++++---
drivers/clocksource/sh_mtu2.c | 9 ++++---
drivers/clocksource/sh_tmu.c | 9 ++++---
drivers/clocksource/sun4i_timer.c | 11 +++++---
drivers/clocksource/tcb_clksrc.c | 11 +++++---
drivers/clocksource/tegra20_timer.c | 7 +++--
drivers/clocksource/time-armada-370-xp.c | 20 ++++++++++----
drivers/clocksource/time-efm32.c | 7 +++--
drivers/clocksource/time-orion.c | 19 ++++++++++---
drivers/clocksource/timer-keystone.c | 9 ++++---
drivers/clocksource/timer-marco.c | 12 ++++++---
drivers/clocksource/timer-prima2.c | 7 +++--
drivers/clocksource/timer-sun5i.c | 11 +++++---
drivers/clocksource/timer-u300.c | 7 +++--
drivers/clocksource/vf_pit_timer.c | 12 ++++++---
drivers/clocksource/vt8500_timer.c | 7 +++--
drivers/clocksource/zevio-timer.c | 8 +++---
include/linux/clockchips.h | 4 +--
kernel/time/clockevents.c | 7 +++--
kernel/time/tick-broadcast-hrtimer.c | 11 +++++---
kernel/time/timer_list.c | 2 +-
105 files changed, 783 insertions(+), 305 deletions(-)
--
2.0.0.rc2
From: Viresh Kumar <viresh.kumar(a)linaro.org>
3.12-stable review patch. If anyone has any objections, please let me know.
===============
commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.
switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.
But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.
Update expiry time before calling switch_hrtimer_base().
[ tglx: Rewrote changelog once again ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: linaro-networking(a)linaro.org
Cc: fweisbec(a)gmail.com
Cc: arvind.chauhan(a)arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.139988225…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
---
kernel/hrtimer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6de65d8a70e2..aa149222cd8e 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1002,11 +1002,8 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
/* Remove an active timer from the queue: */
ret = remove_hrtimer(timer, base);
- /* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-
if (mode & HRTIMER_MODE_REL) {
- tim = ktime_add_safe(tim, new_base->get_time());
+ tim = ktime_add_safe(tim, base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
* to signal that they simply return xtime in
@@ -1021,6 +1018,9 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ /* Switch the timer base, if necessary: */
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
timer_stats_hrtimer_set_start_info(timer);
leftmost = enqueue_hrtimer(timer, new_base);
--
1.9.3
3.2.60-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar <viresh.kumar(a)linaro.org>
commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.
switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.
But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.
Update expiry time before calling switch_hrtimer_base().
[ tglx: Rewrote changelog once again ]
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Cc: linaro-kernel(a)lists.linaro.org
Cc: linaro-networking(a)linaro.org
Cc: fweisbec(a)gmail.com
Cc: arvind.chauhan(a)arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.139988225…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
---
kernel/hrtimer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -980,11 +980,8 @@ int __hrtimer_start_range_ns(struct hrti
/* Remove an active timer from the queue: */
ret = remove_hrtimer(timer, base);
- /* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-
if (mode & HRTIMER_MODE_REL) {
- tim = ktime_add_safe(tim, new_base->get_time());
+ tim = ktime_add_safe(tim, base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
* to signal that they simply return xtime in
@@ -999,6 +996,9 @@ int __hrtimer_start_range_ns(struct hrti
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ /* Switch the timer base, if necessary: */
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
+
timer_stats_hrtimer_set_start_info(timer);
leftmost = enqueue_hrtimer(timer, new_base);