The OPP structures are abused to the best here, without understanding
how the OPP core and RCU locks work.
In short, the OPP pointer saved in 'rk3399_dmcfreq' can become invalid
under your nose, as the OPP core may free it.
Fix various abuses around OPP structures and calls.
Compile tested only.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
I would like it to go via the PM tree. Perhaps that's already the
default tree for this.
drivers/devfreq/rk3399_dmc.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/devfreq/rk3399_dmc.c b/drivers/devfreq/rk3399_dmc.c
index 5063ac1a5939..cf14631b1945 100644
--- a/drivers/devfreq/rk3399_dmc.c
+++ b/drivers/devfreq/rk3399_dmc.c
@@ -80,7 +80,6 @@ struct rk3399_dmcfreq {
struct regulator *vdd_center;
unsigned long rate, target_rate;
unsigned long volt, target_volt;
- struct dev_pm_opp *curr_opp;
};
static int rk3399_dmcfreq_target(struct device *dev, unsigned long *freq,
@@ -102,9 +101,6 @@ static int rk3399_dmcfreq_target(struct device *dev, unsigned long *freq,
target_rate = dev_pm_opp_get_freq(opp);
target_volt = dev_pm_opp_get_voltage(opp);
- dmcfreq->rate = dev_pm_opp_get_freq(dmcfreq->curr_opp);
- dmcfreq->volt = dev_pm_opp_get_voltage(dmcfreq->curr_opp);
-
rcu_read_unlock();
if (dmcfreq->rate == target_rate)
@@ -165,7 +161,9 @@ static int rk3399_dmcfreq_target(struct device *dev, unsigned long *freq,
if (err)
dev_err(dev, "Cannot to set vol %lu uV\n", target_volt);
- dmcfreq->curr_opp = opp;
+ dmcfreq->rate = target_rate;
+ dmcfreq->volt = target_volt;
+
out:
mutex_unlock(&dmcfreq->lock);
return err;
@@ -431,8 +429,9 @@ static int rk3399_dmcfreq_probe(struct platform_device *pdev)
rcu_read_unlock();
return PTR_ERR(opp);
}
+ dmcfreq->rate = dev_pm_opp_get_freq(opp);
+ dmcfreq->volt = dev_pm_opp_get_voltage(opp);
rcu_read_unlock();
- data->curr_opp = opp;
rk3399_devfreq_dmc_profile.initial_freq = data->rate;
--
2.7.1.410.g6faf27b
Hello,
Some platforms have the capability to configure the performance state of
their Power Domains. The performance levels are represented by positive
integer values, a lower value represents lower performance state.
We had some discussions about it in the past on the PM list [1], which
is followed by discussions during the LPC. The outcome of all that was
that we should extend Power Domain framework to support active state
power management as well.
The power-domains until now were only concentrating on the idle state
management of the device and this needs to change in order to reuse the
infrastructure of power domains for active state management.
To get a complete picture of the proposed plan, following is what we
need to do:
- Create DT bindings to get domain performance state information for the
platforms.
- Enhance OPP framework to parse these and call into the PM Qos
framework with a performance state request.
- Enhance PM Qos framework to provide the API to be used by consumers
(or OPP framework) and pass it on to the (Generic) Power Domain
framework.
- Enhance Generic Power Domain framework to accept such requests,
accumulate all belonging to a single power domain and call domain
driver specific callback with the performance state we want for the
domain.
- The domain driver shall then, in a platform specific way, set the
requested performance level.
- Note that these features are applicable to the CPU, GPU and other IIO
or non-IIO devices.
- There can be cases where a device can choose between multiple power
domains based on what performance level we want for the device. In
such cases, we should represent the multiplexer with a separate power
domain. In effect, the device (or OPP table) will correspond to a
single power domain, but the backend driver of that domain shall
implement the multiplexing functionality.
This patchset implements the very first part of this chain and
introduces a new optional property for the consumers of the
power-domains: domain-performance-state. This property can be used
directly by the consumer or its OPP table.
--
viresh
[1] https://marc.info/?l=linux-pm&m=147747923708075&w=2
Viresh Kumar (2):
PM / Domains: Introduce domain-performance-state binding
PM / OPP: Introduce domain-performance-state binding to OPP nodes
Documentation/devicetree/bindings/opp/opp.txt | 57 ++++++++++++++++++++++
.../devicetree/bindings/power/power_domain.txt | 6 +++
2 files changed, 63 insertions(+)
--
2.7.1.410.g6faf27b
The OPP structures are abused to the best here, without understanding
how the OPP core and RCU locks work.
In short, the OPP pointer saved 'struct exynos_bus' can become invalid
under your nose, as the OPP core may free it.
Fix various abuses around OPP structures and calls.
Compile tested only.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
I would like it to go via the PM tree. Perhaps that's already the
default tree for this.
drivers/devfreq/exynos-bus.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/drivers/devfreq/exynos-bus.c b/drivers/devfreq/exynos-bus.c
index 29866f7e6d7e..a8ed7792ece2 100644
--- a/drivers/devfreq/exynos-bus.c
+++ b/drivers/devfreq/exynos-bus.c
@@ -35,7 +35,7 @@ struct exynos_bus {
unsigned int edev_count;
struct mutex lock;
- struct dev_pm_opp *curr_opp;
+ unsigned long curr_freq;
struct regulator *regulator;
struct clk *clk;
@@ -99,7 +99,7 @@ static int exynos_bus_target(struct device *dev, unsigned long *freq, u32 flags)
{
struct exynos_bus *bus = dev_get_drvdata(dev);
struct dev_pm_opp *new_opp;
- unsigned long old_freq, new_freq, old_volt, new_volt, tol;
+ unsigned long old_freq, new_freq, new_volt, tol;
int ret = 0;
/* Get new opp-bus instance according to new bus clock */
@@ -113,8 +113,7 @@ static int exynos_bus_target(struct device *dev, unsigned long *freq, u32 flags)
new_freq = dev_pm_opp_get_freq(new_opp);
new_volt = dev_pm_opp_get_voltage(new_opp);
- old_freq = dev_pm_opp_get_freq(bus->curr_opp);
- old_volt = dev_pm_opp_get_voltage(bus->curr_opp);
+ old_freq = bus->curr_freq;
rcu_read_unlock();
if (old_freq == new_freq)
@@ -146,7 +145,7 @@ static int exynos_bus_target(struct device *dev, unsigned long *freq, u32 flags)
goto out;
}
}
- bus->curr_opp = new_opp;
+ bus->curr_freq = new_freq;
dev_dbg(dev, "Set the frequency of bus (%lukHz -> %lukHz)\n",
old_freq/1000, new_freq/1000);
@@ -163,9 +162,7 @@ static int exynos_bus_get_dev_status(struct device *dev,
struct devfreq_event_data edata;
int ret;
- rcu_read_lock();
- stat->current_frequency = dev_pm_opp_get_freq(bus->curr_opp);
- rcu_read_unlock();
+ stat->current_frequency = bus->curr_freq;
ret = exynos_bus_get_event(bus, &edata);
if (ret < 0) {
@@ -226,7 +223,7 @@ static int exynos_bus_passive_target(struct device *dev, unsigned long *freq,
}
new_freq = dev_pm_opp_get_freq(new_opp);
- old_freq = dev_pm_opp_get_freq(bus->curr_opp);
+ old_freq = bus->curr_freq;
rcu_read_unlock();
if (old_freq == new_freq)
@@ -242,7 +239,7 @@ static int exynos_bus_passive_target(struct device *dev, unsigned long *freq,
}
*freq = new_freq;
- bus->curr_opp = new_opp;
+ bus->curr_freq = new_freq;
dev_dbg(dev, "Set the frequency of bus (%lukHz -> %lukHz)\n",
old_freq/1000, new_freq/1000);
@@ -335,6 +332,7 @@ static int exynos_bus_parse_of(struct device_node *np,
struct exynos_bus *bus)
{
struct device *dev = bus->dev;
+ struct dev_pm_opp *opp;
unsigned long rate;
int ret;
@@ -352,22 +350,23 @@ static int exynos_bus_parse_of(struct device_node *np,
}
/* Get the freq and voltage from OPP table to scale the bus freq */
- rcu_read_lock();
ret = dev_pm_opp_of_add_table(dev);
if (ret < 0) {
dev_err(dev, "failed to get OPP table\n");
- rcu_read_unlock();
goto err_clk;
}
rate = clk_get_rate(bus->clk);
- bus->curr_opp = devfreq_recommended_opp(dev, &rate, 0);
- if (IS_ERR(bus->curr_opp)) {
+
+ rcu_read_lock();
+ opp = devfreq_recommended_opp(dev, &rate, 0);
+ if (IS_ERR(opp)) {
dev_err(dev, "failed to find dev_pm_opp\n");
rcu_read_unlock();
- ret = PTR_ERR(bus->curr_opp);
+ ret = PTR_ERR(opp);
goto err_opp;
}
+ bus->curr_freq = dev_pm_opp_get_freq(opp);
rcu_read_unlock();
return 0;
--
2.7.1.410.g6faf27b
From: Stephen Boyd <sboyd(a)codeaurora.org>
Joonyoung Shim reported an interesting problem on his ARM octa-core
Odoroid-XU3 platform. During system suspend, dev_pm_opp_put_regulator()
was failing for a struct device for which dev_pm_opp_set_regulator() is
called earlier.
This happened because an earlier call to
dev_pm_opp_of_cpumask_remove_table() function (from cpufreq-dt.c file)
removed all the entries from opp_table->dev_list apart from the last CPU
device in the cpumask of CPUs sharing the OPP.
But both dev_pm_opp_set_regulator() and dev_pm_opp_put_regulator()
routines get CPU device for the first CPU in the cpumask. And so the OPP
core failed to find the OPP table for the struct device.
In order to fix that up properly, we need to revisit APIs like
dev_pm_opp_set_regulator() and make them talk in terms of cookies
provided by the OPP core. But such a solution will be hard to backport
to stable kernels.
This patch attempts to fix this problem by returning a pointer to the
opp_table from dev_pm_opp_set_regulator() and using that as the
parameter to dev_pm_opp_put_regulator(). This ensures that the
dev_pm_opp_put_regulator() doesn't fail to find the opp table.
Note that similar design problem also exists with other
dev_pm_opp_put_*() APIs, but those aren't used currently by anyone and
so we don't need to update them for now.
[Viresh]: Written commit log, minor improvements in the patch and tested
on exynos 5250.
Cc: # v4.4+ <stable(a)vger.kernel.org>
Reported-by: Joonyoung Shim <jy0922.shim(a)samsung.com>
Signed-off-by: Stephen Boyd <sboyd(a)codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
V3->V4:
- Completely different approach, suggested earlier by Stephen.
- Can be merged safely now as both /me and Stephen agree to this one.
@Joonyoung: Can you please test this last patch please ?
drivers/base/power/opp/core.c | 37 +++++++++++++------------------------
drivers/cpufreq/cpufreq-dt.c | 12 ++++++++----
include/linux/pm_opp.h | 11 ++++++-----
3 files changed, 27 insertions(+), 33 deletions(-)
diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index 4c7c6da7a989..de27562d9ced 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -1316,58 +1316,57 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_put_prop_name);
* that this function is *NOT* called under RCU protection or in contexts where
* mutex cannot be locked.
*/
-int dev_pm_opp_set_regulator(struct device *dev, const char *name)
+struct opp_table *dev_pm_opp_set_regulator(struct device *dev, const char *name)
{
struct opp_table *opp_table;
struct regulator *reg;
- int ret;
mutex_lock(&opp_table_lock);
opp_table = _add_opp_table(dev);
if (!opp_table) {
- ret = -ENOMEM;
+ opp_table = ERR_PTR(-ENOMEM);
goto unlock;
}
/* This should be called before OPPs are initialized */
if (WARN_ON(!list_empty(&opp_table->opp_list))) {
- ret = -EBUSY;
+ opp_table = ERR_PTR(-EBUSY);
goto err;
}
/* Already have a regulator set */
if (WARN_ON(!IS_ERR(opp_table->regulator))) {
- ret = -EBUSY;
+ opp_table = ERR_PTR(-EBUSY);
goto err;
}
/* Allocate the regulator */
reg = regulator_get_optional(dev, name);
if (IS_ERR(reg)) {
- ret = PTR_ERR(reg);
- if (ret != -EPROBE_DEFER)
- dev_err(dev, "%s: no regulator (%s) found: %d\n",
- __func__, name, ret);
+ opp_table = (struct opp_table *)reg;
+ if (PTR_ERR(reg) != -EPROBE_DEFER)
+ dev_err(dev, "%s: no regulator (%s) found: %ld\n",
+ __func__, name, PTR_ERR(reg));
goto err;
}
opp_table->regulator = reg;
mutex_unlock(&opp_table_lock);
- return 0;
+ return opp_table;
err:
_remove_opp_table(opp_table);
unlock:
mutex_unlock(&opp_table_lock);
- return ret;
+ return opp_table;
}
EXPORT_SYMBOL_GPL(dev_pm_opp_set_regulator);
/**
* dev_pm_opp_put_regulator() - Releases resources blocked for regulator
- * @dev: Device for which regulator was set.
+ * @opp_table: opp_table returned from dev_pm_opp_set_regulator
*
* Locking: The internal opp_table and opp structures are RCU protected.
* Hence this function internally uses RCU updater strategy with mutex locks
@@ -1375,22 +1374,12 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_set_regulator);
* that this function is *NOT* called under RCU protection or in contexts where
* mutex cannot be locked.
*/
-void dev_pm_opp_put_regulator(struct device *dev)
+void dev_pm_opp_put_regulator(struct opp_table *opp_table)
{
- struct opp_table *opp_table;
-
mutex_lock(&opp_table_lock);
- /* Check for existing table for 'dev' first */
- opp_table = _find_opp_table(dev);
- if (IS_ERR(opp_table)) {
- dev_err(dev, "Failed to find opp_table: %ld\n",
- PTR_ERR(opp_table));
- goto unlock;
- }
-
if (IS_ERR(opp_table->regulator)) {
- dev_err(dev, "%s: Doesn't have regulator set\n", __func__);
+ pr_err("%s: Doesn't have regulator set\n", __func__);
goto unlock;
}
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 5c07ae05d69a..4d3ec92cbabf 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -28,6 +28,7 @@
#include "cpufreq-dt.h"
struct private_data {
+ struct opp_table *opp_table;
struct device *cpu_dev;
struct thermal_cooling_device *cdev;
const char *reg_name;
@@ -143,6 +144,7 @@ static int resources_available(void)
static int cpufreq_init(struct cpufreq_policy *policy)
{
struct cpufreq_frequency_table *freq_table;
+ struct opp_table *opp_table = NULL;
struct private_data *priv;
struct device *cpu_dev;
struct clk *cpu_clk;
@@ -186,8 +188,9 @@ static int cpufreq_init(struct cpufreq_policy *policy)
*/
name = find_supply_name(cpu_dev);
if (name) {
- ret = dev_pm_opp_set_regulator(cpu_dev, name);
- if (ret) {
+ opp_table = dev_pm_opp_set_regulator(cpu_dev, name);
+ if (IS_ERR(opp_table)) {
+ ret = PTR_ERR(opp_table);
dev_err(cpu_dev, "Failed to set regulator for cpu%d: %d\n",
policy->cpu, ret);
goto out_put_clk;
@@ -237,6 +240,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
}
priv->reg_name = name;
+ priv->opp_table = opp_table;
ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
if (ret) {
@@ -285,7 +289,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
out_free_opp:
dev_pm_opp_of_cpumask_remove_table(policy->cpus);
if (name)
- dev_pm_opp_put_regulator(cpu_dev);
+ dev_pm_opp_put_regulator(opp_table);
out_put_clk:
clk_put(cpu_clk);
@@ -300,7 +304,7 @@ static int cpufreq_exit(struct cpufreq_policy *policy)
dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table);
dev_pm_opp_of_cpumask_remove_table(policy->related_cpus);
if (priv->reg_name)
- dev_pm_opp_put_regulator(priv->cpu_dev);
+ dev_pm_opp_put_regulator(priv->opp_table);
clk_put(policy->clk);
kfree(priv);
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index bca26157f5b6..f6bc76501912 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -19,6 +19,7 @@
struct dev_pm_opp;
struct device;
+struct opp_table;
enum dev_pm_opp_event {
OPP_EVENT_ADD, OPP_EVENT_REMOVE, OPP_EVENT_ENABLE, OPP_EVENT_DISABLE,
@@ -62,8 +63,8 @@ int dev_pm_opp_set_supported_hw(struct device *dev, const u32 *versions,
void dev_pm_opp_put_supported_hw(struct device *dev);
int dev_pm_opp_set_prop_name(struct device *dev, const char *name);
void dev_pm_opp_put_prop_name(struct device *dev);
-int dev_pm_opp_set_regulator(struct device *dev, const char *name);
-void dev_pm_opp_put_regulator(struct device *dev);
+struct opp_table *dev_pm_opp_set_regulator(struct device *dev, const char *name);
+void dev_pm_opp_put_regulator(struct opp_table *opp_table);
int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
int dev_pm_opp_set_sharing_cpus(struct device *cpu_dev, const struct cpumask *cpumask);
int dev_pm_opp_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask);
@@ -170,12 +171,12 @@ static inline int dev_pm_opp_set_prop_name(struct device *dev, const char *name)
static inline void dev_pm_opp_put_prop_name(struct device *dev) {}
-static inline int dev_pm_opp_set_regulator(struct device *dev, const char *name)
+static inline struct opp_table *dev_pm_opp_set_regulator(struct device *dev, const char *name)
{
- return -ENOTSUPP;
+ return ERR_PTR(-ENOTSUPP);
}
-static inline void dev_pm_opp_put_regulator(struct device *dev) {}
+static inline void dev_pm_opp_put_regulator(struct opp_table *opp_table) {}
static inline int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
{
--
2.7.1.410.g6faf27b
Hi,
Some platforms (like TI) have complex DVFS configuration for CPU
devices, where multiple regulators are required to be configured to
change DVFS state of the device. This was explained well by Nishanth
earlier [1].
One of the major complaints around multiple regulators case was that the
DT isn't responsible in any way to represent the order in which multiple
supplies need to be programmed, before or after a frequency change. It
was considered in this patch and such information is left for the
platform specific OPP driver now, which can register its own
opp_set_rate() callback with the OPP core and the OPP core will then
call it during DVFS.
The patches are tested on Exynos5250 (Dual A15). I have hacked around DT
and code to pass values for multiple regulators and verified that they
are all properly read by the kernel (using debugfs interface).
Dave Gerlach has already tested [2] it on the real TI platforms and it
works well for him.
This is rebased over: linux-next branch in the PM tree.
V5->V6:
- Rebased over a recent fix (and resolved rebase conflicts) that will
get applied before this series:
https://marc.info/?l=linux-kernel&m=148047843010418&w=2
V4->V5:
- Stephen boyd had some minor review comments and gave his Reviewed-by
tag for the rest. Only 2 patches don't have his RBY tag.
- Individual patches contain the version history from V4 to V5.
V3->V4:
- Separate out cpu-supply fix in the binding in a separate patch (Mark).
- Add more documentation to the binding to explain that the relation to
the supplies and the order of programming them is left for the
platform specific bindings and that every platform using multiple
regulators for their devices needs to provide a separate binding
document explaining their implementation (Mark).
- @Rob and Stephen: I have kept your Acks for the bindings as the
bindings only got a bit reworded (improved) since the time you guys
Acked them. Please let me know if you want more improvement in the
bindings now.
- V4 for 10/10 was sent earlier, which added a missing
rcu_read_unlock(). Nothing else changed in it.
- Added some missing Kernel documentation comments
V2->V3:
- The last patch is new
- Removed a debug leftover pr_info() message
- Renamed few names as s/set_rate/set_opp
- Removed a TODO comment (as it is done now with this series)
- created struct for min_uV and max_uV
- kerneldoc comments for structures in pm_opp.h
- s/const char */const char * const
- use kasprintf()
- Some more minor reformatting
- More Ack/RBY tags added
V1->V2:
- Ack from Rob for 1st patch
- Moved the supplies structure to pm_opp.h (Dave)
- Fixed an compilation warning.
--
viresh
[1] https://marc.info/?l=linux-pm&m=145684495832764&w=2
[2] https://marc.info/?l=linux-kernel&m=147924789305276&w=2
Viresh Kumar (10):
PM / OPP: Fix incorrect cpu-supply property in binding
PM / OPP: Reword binding supporting multiple regulators per device
PM / OPP: Don't use OPP structure outside of rcu protected section
PM / OPP: Manage supply's voltage/current in a separate structure
PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
PM / OPP: Add infrastructure to manage multiple regulators
PM / OPP: Separate out _generic_set_opp()
PM / OPP: Allow platform specific custom set_opp() callbacks
PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()
PM / OPP: Don't assume platform doesn't have regulators
Documentation/devicetree/bindings/opp/opp.txt | 27 +-
drivers/base/power/opp/core.c | 535 ++++++++++++++++++++------
drivers/base/power/opp/debugfs.c | 52 ++-
drivers/base/power/opp/of.c | 105 +++--
drivers/base/power/opp/opp.h | 22 +-
drivers/cpufreq/cpufreq-dt.c | 9 +-
include/linux/pm_opp.h | 69 +++-
7 files changed, 635 insertions(+), 184 deletions(-)
--
2.7.1.410.g6faf27b