Patches 1 and 2 of this series fix the issue reported by Hsin-Te Yuan [1] where MT8192-based Chromebooks are not able to suspend/resume 10 times in a row. Either one of those patches on its own is enough to fix the issue, but I believe both are desirable, so I've included them both here.
Patches 3-5 fix unrelated issues that I've noticed while debugging. Patch 3 fixes IRQ storms when the temperature sensors drop to 20 Celsius. Patches 4 and 5 are cleanups to prevent future issues.
To test this series, I've run 'rtcwake -m mem -d 60' 10 times in a row on a MT8192-Asurada-Spherion-rev3 Chromebook and checked that the wakeup happened 60 seconds later (+-5 seconds). I've repeated that test on 10 separate runs. Not once did the chromebook wake up early with the series applied.
I've also checked that during those runs, the LVTS interrupt didn't trigger even once, while before the series it would trigger a few times per run, generally during boot or resume.
Finally, as a sanity check I've verified that the interrupts still work by lowering the thermal trip point to 45 Celsius and running 'stress -c 8'. Indeed they still do, and the temperature showed by the thermal_temperature ftrace event matched the expected value.
[1] https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/
Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- Nícolas F. R. A. Prado (5): thermal/drivers/mediatek/lvts: Disable monitor mode during suspend thermal/drivers/mediatek/lvts: Disable Stage 3 thermal threshold thermal/drivers/mediatek/lvts: Disable low offset IRQ for minimum threshold thermal/drivers/mediatek/lvts: Start sensor interrupts disabled thermal/drivers/mediatek/lvts: Only update IRQ enable for valid sensors
drivers/thermal/mediatek/lvts_thermal.c | 103 ++++++++++++++++++++++---------- 1 file changed, 72 insertions(+), 31 deletions(-) --- base-commit: b852e1e7a0389ed6168ef1d38eb0bad71a6b11e8 change-id: 20241121-mt8192-lvts-filtered-suspend-fix-a5032ca8eceb
Best regards,
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
Disable the temperature monitoring and give some time for the signals to stabilize during suspend in order to prevent such spurious interrupts.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: 8137bb90600d ("thermal/drivers/mediatek/lvts_thermal: Add suspend and resume") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- drivers/thermal/mediatek/lvts_thermal.c | 36 +++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 1997e91bb3be94a3059db619238aa5787edc7675..a92ff2325c40704adc537af6995b34f93c3b0650 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -860,6 +860,32 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, return 0; }
+static void lvts_ctrl_monitor_enable(struct device *dev, struct lvts_ctrl *lvts_ctrl, bool enable) +{ + /* + * Bitmaps to enable each sensor on filtered mode in the MONCTL0 + * register. + */ + u32 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) }; + u32 sensor_map = 0; + int i; + + if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE) + return; + + if (enable) { + lvts_for_each_valid_sensor(i, lvts_ctrl) + sensor_map |= sensor_filt_bitmap[i]; + } + + /* + * Bits: + * 9: Single point access flow + * 0-3: Enable sensing point 0-3 + */ + writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base)); +} + /* * At this point the configuration register is the only place in the * driver where we write multiple values. Per hardware constraint, @@ -1381,8 +1407,11 @@ static int lvts_suspend(struct device *dev)
lvts_td = dev_get_drvdata(dev);
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) + for (i = 0; i < lvts_td->num_lvts_ctrl; i++) { + lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], false); + usleep_range(100, 200); lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], false); + }
clk_disable_unprepare(lvts_td->clk);
@@ -1400,8 +1429,11 @@ static int lvts_resume(struct device *dev) if (ret) return ret;
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) + for (i = 0; i < lvts_td->num_lvts_ctrl; i++) { lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], true); + usleep_range(100, 200); + lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], true); + }
return 0; }
On Tue, Nov 26, 2024 at 5:21 AM Nícolas F. R. A. Prado nfraprado@collabora.com wrote:
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
This sounds weird to me. On my end, the symptom is that the device sometimes cannot suspend. To be more precise, `echo mem > /sys/power/state` returns almost immediately. I think the irq is more likely to be triggered during suspension.
Disable the temperature monitoring and give some time for the signals to stabilize during suspend in order to prevent such spurious interrupts.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: 8137bb90600d ("thermal/drivers/mediatek/lvts_thermal: Add suspend and resume") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com
drivers/thermal/mediatek/lvts_thermal.c | 36 +++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 1997e91bb3be94a3059db619238aa5787edc7675..a92ff2325c40704adc537af6995b34f93c3b0650 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -860,6 +860,32 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, return 0; }
+static void lvts_ctrl_monitor_enable(struct device *dev, struct lvts_ctrl *lvts_ctrl, bool enable) +{
/*
* Bitmaps to enable each sensor on filtered mode in the MONCTL0
* register.
*/
u32 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) };
u32 sensor_map = 0;
int i;
if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE)
return;
if (enable) {
lvts_for_each_valid_sensor(i, lvts_ctrl)
sensor_map |= sensor_filt_bitmap[i];
}
/*
* Bits:
* 9: Single point access flow
* 0-3: Enable sensing point 0-3
*/
writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base));
+}
/*
- At this point the configuration register is the only place in the
- driver where we write multiple values. Per hardware constraint,
@@ -1381,8 +1407,11 @@ static int lvts_suspend(struct device *dev)
lvts_td = dev_get_drvdata(dev);
for (i = 0; i < lvts_td->num_lvts_ctrl; i++)
for (i = 0; i < lvts_td->num_lvts_ctrl; i++) {
lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], false);
usleep_range(100, 200); lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], false);
} clk_disable_unprepare(lvts_td->clk);
@@ -1400,8 +1429,11 @@ static int lvts_resume(struct device *dev) if (ret) return ret;
for (i = 0; i < lvts_td->num_lvts_ctrl; i++)
for (i = 0; i < lvts_td->num_lvts_ctrl; i++) { lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], true);
usleep_range(100, 200);
lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], true);
} return 0;
}
-- 2.47.0
On Tue, Nov 26, 2024 at 04:00:42PM +0800, Hsin-Te Yuan wrote:
On Tue, Nov 26, 2024 at 5:21 AM Nícolas F. R. A. Prado nfraprado@collabora.com wrote:
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
This sounds weird to me. On my end, the symptom is that the device sometimes cannot suspend. To be more precise, `echo mem > /sys/power/state` returns almost immediately. I think the irq is more likely to be triggered during suspension.
Hi Hsin-Te,
please also check the first paragraph of the cover letter, and patch 2, that should clarify it. But anyway, I can explain it here too:
The issue you observed is caused by two things combined: * When returning from resume with filtered mode enabled, the sensor temperature reading can glitch, appearing much higher. (fixed by this patch) * Since the Stage 3 threshold is enabled and configured to take the maximum reading from the sensors, it will be triggered by that glitch and bring the system into a state where it can no longer suspend, it will just resume right away. (fixed by patch 2)
So currently, every so often, during resume both these things will happen, and any future suspend will resume right away. That's why this was never observed by me when testing a single suspend/resume. It only breaks on resume, and only affects future suspends, so you need to test multiple suspend/resumes on the same run to observe this issue.
And also since both things are needed to cause this issue, if you apply only patch 1 or only patch 2, it will already fix the issue.
Hope this clarifies it.
Thanks, Nícolas
On Tue, Nov 26, 2024 at 9:37 PM Nícolas F. R. A. Prado nfraprado@collabora.com wrote:
On Tue, Nov 26, 2024 at 04:00:42PM +0800, Hsin-Te Yuan wrote:
On Tue, Nov 26, 2024 at 5:21 AM Nícolas F. R. A. Prado nfraprado@collabora.com wrote:
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
This sounds weird to me. On my end, the symptom is that the device sometimes cannot suspend. To be more precise, `echo mem > /sys/power/state` returns almost immediately. I think the irq is more likely to be triggered during suspension.
Hi Hsin-Te,
please also check the first paragraph of the cover letter, and patch 2, that should clarify it. But anyway, I can explain it here too:
The issue you observed is caused by two things combined:
- When returning from resume with filtered mode enabled, the sensor temperature reading can glitch, appearing much higher. (fixed by this patch)
- Since the Stage 3 threshold is enabled and configured to take the maximum reading from the sensors, it will be triggered by that glitch and bring the system into a state where it can no longer suspend, it will just resume right away. (fixed by patch 2)
So currently, every so often, during resume both these things will happen, and any future suspend will resume right away. That's why this was never observed by me when testing a single suspend/resume. It only breaks on resume, and only affects future suspends, so you need to test multiple suspend/resumes on the same run to observe this issue.
And also since both things are needed to cause this issue, if you apply only patch 1 or only patch 2, it will already fix the issue.
Hope this clarifies it.
Thanks, Nícolas
Thanks for the explanation!
Regards, Hsin-Te
Il 25/11/24 22:20, Nícolas F. R. A. Prado ha scritto:
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
Disable the temperature monitoring and give some time for the signals to stabilize during suspend in order to prevent such spurious interrupts.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: 8137bb90600d ("thermal/drivers/mediatek/lvts_thermal: Add suspend and resume") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com
drivers/thermal/mediatek/lvts_thermal.c | 36 +++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 1997e91bb3be94a3059db619238aa5787edc7675..a92ff2325c40704adc537af6995b34f93c3b0650 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -860,6 +860,32 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, return 0; } +static void lvts_ctrl_monitor_enable(struct device *dev, struct lvts_ctrl *lvts_ctrl, bool enable) +{
- /*
* Bitmaps to enable each sensor on filtered mode in the MONCTL0
* register.
*/
- u32 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) };
- u32 sensor_map = 0;
- int i;
- if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE)
return;
That's easier and shorter:
static void lvts_ctrl_monitor_enable( .... ) { /* Bitmap to enable each sensor on filtered mode in the MONCTL0 register */ const u32 sensor_map = GENMASK(3, 0);
if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE) return;
/* Bits 0-3: Sensing points - Bit 9: Single point access flow */ if (enable) writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base)); else writel(BIT(9), LVTS_MONCTL0 .... }
Cheers, Angelo
- if (enable) {
lvts_for_each_valid_sensor(i, lvts_ctrl)
sensor_map |= sensor_filt_bitmap[i];
- }
- /*
* Bits:
* 9: Single point access flow
* 0-3: Enable sensing point 0-3
*/
- writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base));
+}
- /*
- At this point the configuration register is the only place in the
- driver where we write multiple values. Per hardware constraint,
@@ -1381,8 +1407,11 @@ static int lvts_suspend(struct device *dev) lvts_td = dev_get_drvdata(dev);
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++)
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) {
lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], false);
lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], false);usleep_range(100, 200);
- }
clk_disable_unprepare(lvts_td->clk); @@ -1400,8 +1429,11 @@ static int lvts_resume(struct device *dev) if (ret) return ret;
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++)
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) { lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], true);
usleep_range(100, 200);
lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], true);
- }
return 0; }
On Tue, Nov 26, 2024 at 10:43:55AM +0100, AngeloGioacchino Del Regno wrote:
Il 25/11/24 22:20, Nícolas F. R. A. Prado ha scritto:
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
Disable the temperature monitoring and give some time for the signals to stabilize during suspend in order to prevent such spurious interrupts.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: 8137bb90600d ("thermal/drivers/mediatek/lvts_thermal: Add suspend and resume") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com
drivers/thermal/mediatek/lvts_thermal.c | 36 +++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 1997e91bb3be94a3059db619238aa5787edc7675..a92ff2325c40704adc537af6995b34f93c3b0650 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -860,6 +860,32 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, return 0; } +static void lvts_ctrl_monitor_enable(struct device *dev, struct lvts_ctrl *lvts_ctrl, bool enable) +{
- /*
* Bitmaps to enable each sensor on filtered mode in the MONCTL0
* register.
*/
- u32 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) };
- u32 sensor_map = 0;
- int i;
- if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE)
return;
That's easier and shorter:
static void lvts_ctrl_monitor_enable( .... ) { /* Bitmap to enable each sensor on filtered mode in the MONCTL0 register */ const u32 sensor_map = GENMASK(3, 0);
if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE) return;
/* Bits 0-3: Sensing points - Bit 9: Single point access flow */ if (enable) writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base));
Wait, no, here you're enabling all the sensors in the controller. We only want to enable ones that are valid, otherwise we might get garbage data and irqs from sensors that aren't actually there. That's why I use the lvts_for_each_valid_sensor() helper in this patch.
Thanks, Nícolas
Il 26/11/24 14:19, Nícolas F. R. A. Prado ha scritto:
On Tue, Nov 26, 2024 at 10:43:55AM +0100, AngeloGioacchino Del Regno wrote:
Il 25/11/24 22:20, Nícolas F. R. A. Prado ha scritto:
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
Disable the temperature monitoring and give some time for the signals to stabilize during suspend in order to prevent such spurious interrupts.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: 8137bb90600d ("thermal/drivers/mediatek/lvts_thermal: Add suspend and resume") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com
drivers/thermal/mediatek/lvts_thermal.c | 36 +++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 1997e91bb3be94a3059db619238aa5787edc7675..a92ff2325c40704adc537af6995b34f93c3b0650 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -860,6 +860,32 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, return 0; } +static void lvts_ctrl_monitor_enable(struct device *dev, struct lvts_ctrl *lvts_ctrl, bool enable) +{
- /*
* Bitmaps to enable each sensor on filtered mode in the MONCTL0
* register.
*/
- u32 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) };
- u32 sensor_map = 0;
- int i;
- if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE)
return;
That's easier and shorter:
static void lvts_ctrl_monitor_enable( .... ) { /* Bitmap to enable each sensor on filtered mode in the MONCTL0 register */ const u32 sensor_map = GENMASK(3, 0);
if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE) return;
/* Bits 0-3: Sensing points - Bit 9: Single point access flow */ if (enable) writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base));
Wait, no, here you're enabling all the sensors in the controller. We only want to enable ones that are valid, otherwise we might get garbage data and irqs from sensors that aren't actually there. That's why I use the lvts_for_each_valid_sensor() helper in this patch.
Whoa, my brain actually missed the lvts_for_each_valid_sensor()!
Okay no, then you're right - sorry for the bad example! In that case, though, I still have one more comment.
You can constify sensor_filt_bitmap, and since the values never go higher than BIT(3), you should also be able to spare some memory by turning that into a u8:
const u8 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) };
...and then I assume that there's no way valid sensors could ever read from an index that is more than 4 (so, I assume that there's no way the loop tries to read out of the array upper boundary).
In which case - after at least constifying the sensor_filt_bitmap array, for v2 feel free to add my
Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com
...and sorry again for the initial miss :-)
Cheers, Angelo
The Stage 3 thermal threshold is currently configured during the controller initialization to 105 Celsius. From the kernel perspective, this configuration is harmful because: * The stage 3 interrupt that gets triggered when the threshold is crossed is not handled in any way by the IRQ handler, it just gets cleared. Besides, the temperature used for stage 3 comes from the sensors, and the critical thermal trip points described in the Devicetree will already cause a shutdown when crossed (at a lower temperature, of 100 Celsius, for all SoCs currently using this driver). * The only effect of crossing the stage 3 threshold that has been observed is that it causes the machine to no longer be able to enter suspend. Even if that was a result of a momentary glitch in the temperature reading of a sensor (as has been observed on the MT8192-based Chromebooks).
For those reasons, disable the Stage 3 thermal threshold configuration.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- drivers/thermal/mediatek/lvts_thermal.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index a92ff2325c40704adc537af6995b34f93c3b0650..6ac33030f015c7239e36d81018d1a6893cb69ef8 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -65,7 +65,7 @@ #define LVTS_HW_FILTER 0x0 #define LVTS_TSSEL_CONF 0x13121110 #define LVTS_CALSCALE_CONF 0x300 -#define LVTS_MONINT_CONF 0x8300318C +#define LVTS_MONINT_CONF 0x0300318C
#define LVTS_MONINT_OFFSET_SENSOR0 0xC #define LVTS_MONINT_OFFSET_SENSOR1 0x180 @@ -91,8 +91,6 @@ #define LVTS_MSR_READ_TIMEOUT_US 400 #define LVTS_MSR_READ_WAIT_US (LVTS_MSR_READ_TIMEOUT_US / 2)
-#define LVTS_HW_TSHUT_TEMP 105000 - #define LVTS_MINIMUM_THRESHOLD 20000
static int golden_temp = LVTS_GOLDEN_TEMP_DEFAULT; @@ -145,7 +143,6 @@ struct lvts_ctrl { struct lvts_sensor sensors[LVTS_SENSOR_MAX]; const struct lvts_data *lvts_data; u32 calibration[LVTS_SENSOR_MAX]; - u32 hw_tshut_raw_temp; u8 valid_sensor_mask; int mode; void __iomem *base; @@ -837,14 +834,6 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, */ lvts_ctrl[i].mode = lvts_data->lvts_ctrl[i].mode;
- /* - * The temperature to raw temperature must be done - * after initializing the calibration. - */ - lvts_ctrl[i].hw_tshut_raw_temp = - lvts_temp_to_raw(LVTS_HW_TSHUT_TEMP, - lvts_data->temp_factor); - lvts_ctrl[i].low_thresh = INT_MIN; lvts_ctrl[i].high_thresh = INT_MIN; } @@ -919,7 +908,6 @@ static int lvts_irq_init(struct lvts_ctrl *lvts_ctrl) * 10 : Selected sensor with bits 19-18 * 11 : Reserved */ - writel(BIT(16), LVTS_PROTCTL(lvts_ctrl->base));
/* * LVTS_PROTTA : Stage 1 temperature threshold @@ -932,8 +920,8 @@ static int lvts_irq_init(struct lvts_ctrl *lvts_ctrl) * * writel(0x0, LVTS_PROTTA(lvts_ctrl->base)); * writel(0x0, LVTS_PROTTB(lvts_ctrl->base)); + * writel(0x0, LVTS_PROTTC(lvts_ctrl->base)); */ - writel(lvts_ctrl->hw_tshut_raw_temp, LVTS_PROTTC(lvts_ctrl->base));
/* * LVTS_MONINT : Interrupt configuration register
Il 25/11/24 22:20, Nícolas F. R. A. Prado ha scritto:
The Stage 3 thermal threshold is currently configured during the controller initialization to 105 Celsius. From the kernel perspective, this configuration is harmful because:
- The stage 3 interrupt that gets triggered when the threshold is crossed is not handled in any way by the IRQ handler, it just gets cleared. Besides, the temperature used for stage 3 comes from the sensors, and the critical thermal trip points described in the Devicetree will already cause a shutdown when crossed (at a lower temperature, of 100 Celsius, for all SoCs currently using this driver).
- The only effect of crossing the stage 3 threshold that has been observed is that it causes the machine to no longer be able to enter suspend. Even if that was a result of a momentary glitch in the temperature reading of a sensor (as has been observed on the MT8192-based Chromebooks).
For those reasons, disable the Stage 3 thermal threshold configuration.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com
Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com
In order to get working interrupts, a low offset value needs to be configured. The minimum value for it is 20 Celsius, which is what is configured when there's no lower thermal trip (ie the thermal core passes -INT_MAX as low trip temperature). However, when the temperature gets that low and fluctuates around that value it causes an interrupt storm.
Prevent that interrupt storm by not enabling the low offset interrupt if the low threshold is the minimum one.
Cc: stable@vger.kernel.org Fixes: 77354eaef821 ("thermal/drivers/mediatek/lvts_thermal: Don't leave threshold zeroed") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- drivers/thermal/mediatek/lvts_thermal.c | 48 ++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 13 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 6ac33030f015c7239e36d81018d1a6893cb69ef8..2271023f090df82fbdd0b5755bb34879e58b0533 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -67,10 +67,14 @@ #define LVTS_CALSCALE_CONF 0x300 #define LVTS_MONINT_CONF 0x0300318C
-#define LVTS_MONINT_OFFSET_SENSOR0 0xC -#define LVTS_MONINT_OFFSET_SENSOR1 0x180 -#define LVTS_MONINT_OFFSET_SENSOR2 0x3000 -#define LVTS_MONINT_OFFSET_SENSOR3 0x3000000 +#define LVTS_MONINT_OFFSET_HIGH_SENSOR0 BIT(3) +#define LVTS_MONINT_OFFSET_HIGH_SENSOR1 BIT(8) +#define LVTS_MONINT_OFFSET_HIGH_SENSOR2 BIT(13) +#define LVTS_MONINT_OFFSET_HIGH_SENSOR3 BIT(25) +#define LVTS_MONINT_OFFSET_LOW_SENSOR0 BIT(2) +#define LVTS_MONINT_OFFSET_LOW_SENSOR1 BIT(7) +#define LVTS_MONINT_OFFSET_LOW_SENSOR2 BIT(12) +#define LVTS_MONINT_OFFSET_LOW_SENSOR3 BIT(24)
#define LVTS_INT_SENSOR0 0x0009001F #define LVTS_INT_SENSOR1 0x001203E0 @@ -326,11 +330,17 @@ static int lvts_get_temp(struct thermal_zone_device *tz, int *temp)
static void lvts_update_irq_mask(struct lvts_ctrl *lvts_ctrl) { - u32 masks[] = { - LVTS_MONINT_OFFSET_SENSOR0, - LVTS_MONINT_OFFSET_SENSOR1, - LVTS_MONINT_OFFSET_SENSOR2, - LVTS_MONINT_OFFSET_SENSOR3, + u32 high_offset_masks[] = { + LVTS_MONINT_OFFSET_HIGH_SENSOR0, + LVTS_MONINT_OFFSET_HIGH_SENSOR1, + LVTS_MONINT_OFFSET_HIGH_SENSOR2, + LVTS_MONINT_OFFSET_HIGH_SENSOR3, + }; + u32 low_offset_masks[] = { + LVTS_MONINT_OFFSET_LOW_SENSOR0, + LVTS_MONINT_OFFSET_LOW_SENSOR1, + LVTS_MONINT_OFFSET_LOW_SENSOR2, + LVTS_MONINT_OFFSET_LOW_SENSOR3, }; u32 value = 0; int i; @@ -339,10 +349,22 @@ static void lvts_update_irq_mask(struct lvts_ctrl *lvts_ctrl)
for (i = 0; i < ARRAY_SIZE(masks); i++) { if (lvts_ctrl->sensors[i].high_thresh == lvts_ctrl->high_thresh - && lvts_ctrl->sensors[i].low_thresh == lvts_ctrl->low_thresh) - value |= masks[i]; - else - value &= ~masks[i]; + && lvts_ctrl->sensors[i].low_thresh == lvts_ctrl->low_thresh) { + /* + * The minimum threshold needs to be configured in the + * OFFSETL register to get working interrupts, but we + * don't actually want to generate interrupts when + * crossing it. + */ + if (lvts_ctrl->low_thresh == -INT_MAX) { + value &= ~low_offset_masks[i]; + value |= high_offset_masks[i]; + } else { + value |= low_offset_masks[i] | high_offset_masks[i]; + } + } else { + value &= ~(low_offset_masks[i] | high_offset_masks[i]); + } }
writel(value, LVTS_MONINT(lvts_ctrl->base));
Il 25/11/24 22:20, Nícolas F. R. A. Prado ha scritto:
In order to get working interrupts, a low offset value needs to be configured. The minimum value for it is 20 Celsius, which is what is configured when there's no lower thermal trip (ie the thermal core passes -INT_MAX as low trip temperature). However, when the temperature gets that low and fluctuates around that value it causes an interrupt storm.
Prevent that interrupt storm by not enabling the low offset interrupt if the low threshold is the minimum one.
Cc: stable@vger.kernel.org Fixes: 77354eaef821 ("thermal/drivers/mediatek/lvts_thermal: Don't leave threshold zeroed") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com
drivers/thermal/mediatek/lvts_thermal.c | 48 ++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 13 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 6ac33030f015c7239e36d81018d1a6893cb69ef8..2271023f090df82fbdd0b5755bb34879e58b0533 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -67,10 +67,14 @@ #define LVTS_CALSCALE_CONF 0x300 #define LVTS_MONINT_CONF 0x0300318C -#define LVTS_MONINT_OFFSET_SENSOR0 0xC -#define LVTS_MONINT_OFFSET_SENSOR1 0x180 -#define LVTS_MONINT_OFFSET_SENSOR2 0x3000 -#define LVTS_MONINT_OFFSET_SENSOR3 0x3000000 +#define LVTS_MONINT_OFFSET_HIGH_SENSOR0 BIT(3)
Yeah it's longer, but that's more readable:
#define LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR0
...because what this BIT does is enabling the high offset interrupt for the sensing point 0 (which in this driver we call sensor 0).
That name would make it (imo) way less likely to need any datasheet to understand what is actually going on with that setting :-)
+#define LVTS_MONINT_OFFSET_HIGH_SENSOR1 BIT(8) +#define LVTS_MONINT_OFFSET_HIGH_SENSOR2 BIT(13) +#define LVTS_MONINT_OFFSET_HIGH_SENSOR3 BIT(25) +#define LVTS_MONINT_OFFSET_LOW_SENSOR0 BIT(2)
Of course, the comment is valid for the LOW ones as well!
Everything else is good for me, and since it is just about simple renaming, I can already give you my
Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com
+#define LVTS_MONINT_OFFSET_LOW_SENSOR1 BIT(7) +#define LVTS_MONINT_OFFSET_LOW_SENSOR2 BIT(12) +#define LVTS_MONINT_OFFSET_LOW_SENSOR3 BIT(24) #define LVTS_INT_SENSOR0 0x0009001F #define LVTS_INT_SENSOR1 0x001203E0 @@ -326,11 +330,17 @@ static int lvts_get_temp(struct thermal_zone_device *tz, int *temp) static void lvts_update_irq_mask(struct lvts_ctrl *lvts_ctrl) {
- u32 masks[] = {
LVTS_MONINT_OFFSET_SENSOR0,
LVTS_MONINT_OFFSET_SENSOR1,
LVTS_MONINT_OFFSET_SENSOR2,
LVTS_MONINT_OFFSET_SENSOR3,
- u32 high_offset_masks[] = {
LVTS_MONINT_OFFSET_HIGH_SENSOR0,
LVTS_MONINT_OFFSET_HIGH_SENSOR1,
LVTS_MONINT_OFFSET_HIGH_SENSOR2,
LVTS_MONINT_OFFSET_HIGH_SENSOR3,
- };
- u32 low_offset_masks[] = {
LVTS_MONINT_OFFSET_LOW_SENSOR0,
LVTS_MONINT_OFFSET_LOW_SENSOR1,
LVTS_MONINT_OFFSET_LOW_SENSOR2,
}; u32 value = 0; int i;LVTS_MONINT_OFFSET_LOW_SENSOR3,
@@ -339,10 +349,22 @@ static void lvts_update_irq_mask(struct lvts_ctrl *lvts_ctrl) for (i = 0; i < ARRAY_SIZE(masks); i++) { if (lvts_ctrl->sensors[i].high_thresh == lvts_ctrl->high_thresh
&& lvts_ctrl->sensors[i].low_thresh == lvts_ctrl->low_thresh)
value |= masks[i];
else
value &= ~masks[i];
&& lvts_ctrl->sensors[i].low_thresh == lvts_ctrl->low_thresh) {
/*
* The minimum threshold needs to be configured in the
* OFFSETL register to get working interrupts, but we
* don't actually want to generate interrupts when
* crossing it.
*/
if (lvts_ctrl->low_thresh == -INT_MAX) {
value &= ~low_offset_masks[i];
value |= high_offset_masks[i];
} else {
value |= low_offset_masks[i] | high_offset_masks[i];
}
} else {
value &= ~(low_offset_masks[i] | high_offset_masks[i]);
}}
writel(value, LVTS_MONINT(lvts_ctrl->base));
linux-stable-mirror@lists.linaro.org