After upgrading to Linux 5.13.3 I noticed my laptop would shutdown due to overheat (when it should not). It turned out this was due to commit fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix tcc setting").
What happens is this drivers uses a global variable to keep track of the tcc offset (tcc_offset_save) and uses it on resume. The issue is this variable is initialized to 0, but is only set in tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly set by userspace. If that does not happen, the resume path will set the offset to 0 (in my case the h/w default being 3, the offset would become too low after a suspend/resume cycle).
The issue did not arise before commit fe6a6de6692e, as the function setting the offset would return if the offset was 0. This is no longer the case (rightfully).
Fix this by not applying the offset if it wasn't saved before, reverting back to the old logic. A better approach will come later, but this will be easier to apply to stable kernels.
The logic to restore the offset after a resume was there long before commit fe6a6de6692e, but as a value of 0 was considered invalid I'm referencing the commit that made the issue possible in the Fixes tag instead.
Fixes: fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix tcc setting") Cc: stable@vger.kernel.org Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Signed-off-by: Antoine Tenart atenart@kernel.org --- .../thermal/intel/int340x_thermal/processor_thermal_device.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c index 0f0038af2ad4..fb64acfd5e07 100644 --- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c +++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c @@ -107,7 +107,7 @@ static int tcc_offset_update(unsigned int tcc) return 0; }
-static unsigned int tcc_offset_save; +static int tcc_offset_save = -1;
static ssize_t tcc_offset_degree_celsius_store(struct device *dev, struct device_attribute *attr, const char *buf, @@ -352,7 +352,8 @@ int proc_thermal_resume(struct device *dev) proc_dev = dev_get_drvdata(dev); proc_thermal_read_ppcc(proc_dev);
- tcc_offset_update(tcc_offset_save); + if (tcc_offset_save >= 0) + tcc_offset_update(tcc_offset_save);
return 0; }
Hi Daniel,
This patch is important. Can we send for 5.15 rc release?
I see the previous version of this patch is applied to linux-next. But this series is better as it splits into two patches. The first one can be easily backported and will fix the problem. The second one is an improvement.
Thanks, Srinivas
On Thu, 2021-09-09 at 10:56 +0200, Antoine Tenart wrote:
After upgrading to Linux 5.13.3 I noticed my laptop would shutdown due to overheat (when it should not). It turned out this was due to commit fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix tcc setting").
What happens is this drivers uses a global variable to keep track of the tcc offset (tcc_offset_save) and uses it on resume. The issue is this variable is initialized to 0, but is only set in tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly set by userspace. If that does not happen, the resume path will set the offset to 0 (in my case the h/w default being 3, the offset would become too low after a suspend/resume cycle).
The issue did not arise before commit fe6a6de6692e, as the function setting the offset would return if the offset was 0. This is no longer the case (rightfully).
Fix this by not applying the offset if it wasn't saved before, reverting back to the old logic. A better approach will come later, but this will be easier to apply to stable kernels.
The logic to restore the offset after a resume was there long before commit fe6a6de6692e, but as a value of 0 was considered invalid I'm referencing the commit that made the issue possible in the Fixes tag instead.
Fixes: fe6a6de6692e ("thermal/drivers/int340x/processor_thermal: Fix tcc setting") Cc: stable@vger.kernel.org Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Signed-off-by: Antoine Tenart atenart@kernel.org
.../thermal/intel/int340x_thermal/processor_thermal_device.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c index 0f0038af2ad4..fb64acfd5e07 100644
a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c +++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c @@ -107,7 +107,7 @@ static int tcc_offset_update(unsigned int tcc) return 0; } -static unsigned int tcc_offset_save; +static int tcc_offset_save = -1; static ssize_t tcc_offset_degree_celsius_store(struct device *dev, struct device_attribute *attr, const char *buf, @@ -352,7 +352,8 @@ int proc_thermal_resume(struct device *dev) proc_dev = dev_get_drvdata(dev); proc_thermal_read_ppcc(proc_dev); - tcc_offset_update(tcc_offset_save); + if (tcc_offset_save >= 0) + tcc_offset_update(tcc_offset_save); return 0; }
On 24/09/2021 18:27, Srinivas Pandruvada wrote:
Hi Daniel,
This patch is important. Can we send for 5.15 rc release?
I see the previous version of this patch is applied to linux-next. But this series is better as it splits into two patches. The first one can be easily backported and will fix the problem. The second one is an improvement.
Yes, it is in the pipe.
I've applied the patch 1/2 to the fixes branch and the patch 2/2 will land in the next branch as soon as the next -rc is released with the fix and merged to the next branch.
On Fri, 2021-09-24 at 19:40 +0200, Daniel Lezcano wrote:
On 24/09/2021 18:27, Srinivas Pandruvada wrote:
Hi Daniel,
This patch is important. Can we send for 5.15 rc release?
I see the previous version of this patch is applied to linux-next. But this series is better as it splits into two patches. The first one can be easily backported and will fix the problem. The second one is an improvement.
Yes, it is in the pipe.
I've applied the patch 1/2 to the fixes branch and the patch 2/2 will land in the next branch as soon as the next -rc is released with the fix and merged to the next branch.
Thanks Daniel.
-Srinivas
Hello Daniel,
Quoting Daniel Lezcano (2021-09-24 19:40:13)
I've applied the patch 1/2 to the fixes branch and the patch 2/2 will land in the next branch as soon as the next -rc is released with the fix and merged to the next branch.
I don't see it in thermal/next even though patch 1 has made it. Not sure if patch 2 has slipped through the cracks or wasn't pushed yet. If it's the later, please ignore this mail.
Thanks! Antoine
On 20/10/2021 15:38, Antoine Tenart wrote:
Hello Daniel,
Quoting Daniel Lezcano (2021-09-24 19:40:13)
I've applied the patch 1/2 to the fixes branch and the patch 2/2 will land in the next branch as soon as the next -rc is released with the fix and merged to the next branch.
I don't see it in thermal/next even though patch 1 has made it. Not sure if patch 2 has slipped through the cracks or wasn't pushed yet. If it's the later, please ignore this mail.
Indeed, I thougth I picked it but it wasn't.
Thanks for the head up, it is applied now.
-- D.
Quoting Daniel Lezcano (2021-10-21 11:47:50)
On 20/10/2021 15:38, Antoine Tenart wrote:
Quoting Daniel Lezcano (2021-09-24 19:40:13)
I've applied the patch 1/2 to the fixes branch and the patch 2/2 will land in the next branch as soon as the next -rc is released with the fix and merged to the next branch.
I don't see it in thermal/next even though patch 1 has made it. Not sure if patch 2 has slipped through the cracks or wasn't pushed yet. If it's the later, please ignore this mail.
Indeed, I thougth I picked it but it wasn't.
Thanks for the head up, it is applied now.
Thanks!
linux-stable-mirror@lists.linaro.org