This is only relevant to implementations with multiple clusters, where clusters have separate clock lines but all CPUs within a cluster share it.
Consider a dual cluster platform with 2 cores per cluster. During suspend we start offlining CPUs from 1 to 3. When CPU2 is remove, policy->kobj would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its kobj.
Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev(). We will recover the old policy and update policy->cpu from 3 to 2 from update_policy_cpu().
But the kobj is still tied to CPU3 and wasn't moved to CPU2. We wouldn't create a link for CPU2, but would try that while bringing CPU3 online. Which will report errors as CPU3 already has kobj assigned to it.
This bug got introduced with commit 42f921a, which overlooked this scenario.
To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a cluster back.
Fixes: ("42f921a cpufreq: remove sysfs files for CPUs which failed to come back after resume") Cc: Stable stable@vger.kernel.org # 3.13+ Reported-by: Bu Yitian ybu@qti.qualcomm.com Reported-by: Saravana Kannan skannan@codeaurora.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org --- Hi Rafael,
This is for 3.16 release, please take it once Yitian/Saravana test this out.
@Yitian/Saravana: Sorry of overlooking this when both of you reported this first. I (and Srivatsa as well) was damn sure that this scenario is taken into account in current code and a close look proved that wrong.
I couldn't test it out, can any of you please see if it fixes things for you?
drivers/cpufreq/cpufreq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62259d2..6f02485 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1153,10 +1153,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) * the creation of a brand new one. So we need to perform this update * by invoking update_policy_cpu(). */ - if (recover_policy && cpu != policy->cpu) + if (recover_policy && cpu != policy->cpu) { update_policy_cpu(policy, cpu); - else + WARN_ON(kobject_move(&policy->kobj, &dev->kobj)); + } else { policy->cpu = cpu; + }
cpumask_copy(policy->cpus, cpumask_of(cpu));
Hi Viresh:
I have verified this patch, all the issues I reported disappeared. Thanks for the quick fix.
Best Regards
-----Original Message----- From: Viresh Kumar [mailto:viresh.kumar@linaro.org] Sent: Thursday, July 10, 2014 1:19 PM To: rjw@rjwysocki.net Cc: linaro-kernel@lists.linaro.org; linux-pm@vger.kernel.org; arvind.chauhan@arm.com; srivatsa@mit.edu; skannan@codeaurora.org; Bu, Yitian; Viresh Kumar; Stable Subject: [PATCH] cpufreq: move policy kobj to policy->cpu at resume
This is only relevant to implementations with multiple clusters, where clusters have separate clock lines but all CPUs within a cluster share it.
Consider a dual cluster platform with 2 cores per cluster. During suspend we start offlining CPUs from 1 to 3. When CPU2 is remove, policy->kobj would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its kobj.
Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev(). We will recover the old policy and update policy->cpu from 3 to 2 from update_policy_cpu().
But the kobj is still tied to CPU3 and wasn't moved to CPU2. We wouldn't create a link for CPU2, but would try that while bringing CPU3 online. Which will report errors as CPU3 already has kobj assigned to it.
This bug got introduced with commit 42f921a, which overlooked this scenario.
To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a cluster back.
Fixes: ("42f921a cpufreq: remove sysfs files for CPUs which failed to come back after resume") Cc: Stable stable@vger.kernel.org # 3.13+ Reported-by: Bu Yitian ybu@qti.qualcomm.com Reported-by: Saravana Kannan skannan@codeaurora.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Hi Rafael,
This is for 3.16 release, please take it once Yitian/Saravana test this out.
@Yitian/Saravana: Sorry of overlooking this when both of you reported this first. I (and Srivatsa as well) was damn sure that this scenario is taken into account in current code and a close look proved that wrong.
I couldn't test it out, can any of you please see if it fixes things for you?
drivers/cpufreq/cpufreq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62259d2..6f02485 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1153,10 +1153,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) * the creation of a brand new one. So we need to perform this update * by invoking update_policy_cpu(). */
- if (recover_policy && cpu != policy->cpu)
- if (recover_policy && cpu != policy->cpu) { update_policy_cpu(policy, cpu);
- else
WARN_ON(kobject_move(&policy->kobj, &dev->kobj));
} else { policy->cpu = cpu;
}
cpumask_copy(policy->cpus, cpumask_of(cpu));
-- 2.0.0.rc2
On 10 July 2014 12:38, Bu, Yitian ybu@qti.qualcomm.com wrote:
Hi Viresh:
I have verified this patch, all the issues I reported disappeared. Thanks for the quick fix.
Please don't top post. http://en.wikipedia.org/wiki/Posting_style
Rafael would be adding your:
Tested-by: Bu Yitian ybu@qti.qualcomm.com
Let us know if you don't want that.
-----Original Message----- From: Viresh Kumar [mailto:viresh.kumar@linaro.org] Sent: Thursday, July 10, 2014 3:10 PM To: Bu, Yitian Cc: rjw@rjwysocki.net; linaro-kernel@lists.linaro.org; linux- pm@vger.kernel.org; arvind.chauhan@arm.com; srivatsa@mit.edu; skannan@codeaurora.org; Stable Subject: Re: [PATCH] cpufreq: move policy kobj to policy->cpu at resume
On 10 July 2014 12:38, Bu, Yitian ybu@qti.qualcomm.com wrote:
Hi Viresh:
I have verified this patch, all the issues I reported disappeared. Thanks for the quick fix.
Please don't top post. http://en.wikipedia.org/wiki/Posting_style
Rafael would be adding your:
Tested-by: Bu Yitian ybu@qti.qualcomm.com
Let us know if you don't want that.
It is ok, thanks.
On 07/10/2014 10:49 AM, Viresh Kumar wrote:
This is only relevant to implementations with multiple clusters, where clusters have separate clock lines but all CPUs within a cluster share it.
Consider a dual cluster platform with 2 cores per cluster. During suspend we start offlining CPUs from 1 to 3. When CPU2 is remove, policy->kobj would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its kobj.
Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev(). We will recover the old policy and update policy->cpu from 3 to 2 from update_policy_cpu().
But the kobj is still tied to CPU3 and wasn't moved to CPU2. We wouldn't create a link for CPU2, but would try that while bringing CPU3 online. Which will report errors as CPU3 already has kobj assigned to it.
This bug got introduced with commit 42f921a, which overlooked this scenario.
To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a cluster back.
Fixes: ("42f921a cpufreq: remove sysfs files for CPUs which failed to come back after resume") Cc: Stable stable@vger.kernel.org # 3.13+ Reported-by: Bu Yitian ybu@qti.qualcomm.com Reported-by: Saravana Kannan skannan@codeaurora.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org
Looks good to me. But I think it would be better to move the invocation of kobject_move() to update_policy_cpu() itself, so that update_policy_cpu() will do all the work involved in updating the policy->cpu, as its name suggests.
With that small nit,
Reviewed-by: Srivatsa S. Bhat srivatsa@mit.edu
Regards, Srivatsa S. Bhat
Hi Rafael,
This is for 3.16 release, please take it once Yitian/Saravana test this out.
@Yitian/Saravana: Sorry of overlooking this when both of you reported this first. I (and Srivatsa as well) was damn sure that this scenario is taken into account in current code and a close look proved that wrong.
I couldn't test it out, can any of you please see if it fixes things for you?
drivers/cpufreq/cpufreq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 62259d2..6f02485 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1153,10 +1153,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) * the creation of a brand new one. So we need to perform this update * by invoking update_policy_cpu(). */
- if (recover_policy && cpu != policy->cpu)
- if (recover_policy && cpu != policy->cpu) { update_policy_cpu(policy, cpu);
- else
WARN_ON(kobject_move(&policy->kobj, &dev->kobj));
- } else { policy->cpu = cpu;
- }
cpumask_copy(policy->cpus, cpumask_of(cpu));
On 10 July 2014 16:45, Srivatsa S. Bhat srivatsa@mit.edu wrote:
Looks good to me. But I think it would be better to move the invocation of kobject_move() to update_policy_cpu() itself, so that update_policy_cpu() will do all the work involved in updating the policy->cpu, as its name suggests.
Its called from remove path as well ..
On 07/10/2014 04:50 PM, Viresh Kumar wrote:
On 10 July 2014 16:45, Srivatsa S. Bhat srivatsa@mit.edu wrote:
Looks good to me. But I think it would be better to move the invocation of kobject_move() to update_policy_cpu() itself, so that update_policy_cpu() will do all the work involved in updating the policy->cpu, as its name suggests.
Its called from remove path as well ..
I know.. That's why it makes even more sense to consolidate all the work into one function. We can restructure cpufreq_nominate_new_policy_cpu() such that the kobject_move() can be moved to update_policy_cpu().
Regards, Srivatsa S. Bhat
On 10 July 2014 16:52, Srivatsa S. Bhat srivatsa@mit.edu wrote:
I know.. That's why it makes even more sense to consolidate all the work into one function. We can restructure cpufreq_nominate_new_policy_cpu() such that the kobject_move() can be moved to update_policy_cpu().
Done.
On 10 July 2014 16:45, Srivatsa S. Bhat srivatsa@mit.edu wrote:
Looks good to me. But I think it would be better to move the invocation of kobject_move() to update_policy_cpu() itself, so that update_policy_cpu() will do all the work involved in updating the policy->cpu, as its name suggests.
With that small nit,
Reviewed-by: Srivatsa S. Bhat srivatsa@mit.edu
Hi Rafael,
I had a chat with Srivatsa about this patch and the V2 version which people aren't able to test yet.
I proposed that we take this patch as is and hold V2 for some time. - V2 wouldn't apply cleanly to stable kernels for sure - V2 isn't yet tested and V1 is. - Saravana already proposed a patch which would remove most of what V2 is adding: http://www.spinics.net/lists/arm-kernel/msg346604.html
Our chats:
<vireshk> srivatsa, http://www.spinics.net/lists/arm-kernel/msg346604.html <srivatsa> vireshk, thanks for the pointer.. will try to take a look by tonight..
<vireshk> srivatsa, Because saravana is actually looking to change much of the stuff, what about dropping the V2 fix that I sent yesterday and take V1 only for now? <vireshk> srivatsa, As V1 would apply cleanly over stable kernels as well <srivatsa> vireshk, sure, no problem... perhaps you can make the code reorganization of the kobject_move as a separate patch and submit it for merge window instead of -rc <vireshk> srivatsa, I was even thinking of that as well <vireshk> Have two patches, only first one for rc and second one for next release <vireshk> srivatsa, But the second patch would be overwritten by Saravanna, so might not be of any use <srivatsa> vireshk, ah, i see.. in that case, we can hold off on the second patch for now.. just v1 should do..
What do you say?
linaro-kernel@lists.linaro.org