Patches 1 and 2 of this series fix the issue reported by Hsin-Te Yuan [1] where MT8192-based Chromebooks are not able to suspend/resume 10 times in a row. Either one of those patches on its own is enough to fix the issue, but I believe both are desirable, so I've included them both here.
Patches 3-5 fix unrelated issues that I've noticed while debugging. Patch 3 fixes IRQ storms when the temperature sensors drop to 20 Celsius. Patches 4 and 5 are cleanups to prevent future issues.
To test this series, I've run 'rtcwake -m mem -d 60' 10 times in a row on a MT8192-Asurada-Spherion-rev3 Chromebook and checked that the wakeup happened 60 seconds later (+-5 seconds). I've repeated that test on 10 separate runs. Not once did the chromebook wake up early with the series applied.
I've also checked that during those runs, the LVTS interrupt didn't trigger even once, while before the series it would trigger a few times per run, generally during boot or resume.
Finally, as a sanity check I've verified that the interrupts still work by lowering the thermal trip point to 45 Celsius and running 'stress -c 8'. Indeed they still do, and the temperature showed by the thermal_temperature ftrace event matched the expected value.
[1] https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/
Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- Changes in v2: - Renamed bitmasks for interrupt enable (added "INTEN" to the name) - Made read-only arrays static const - Changed sensor_filt_bitmap array from u32 to u8 to save memory - Rebased on next-20241209 - Link to v1: https://lore.kernel.org/r/20241125-mt8192-lvts-filtered-suspend-fix-v1-0-42e...
--- Nícolas F. R. A. Prado (5): thermal/drivers/mediatek/lvts: Disable monitor mode during suspend thermal/drivers/mediatek/lvts: Disable Stage 3 thermal threshold thermal/drivers/mediatek/lvts: Disable low offset IRQ for minimum threshold thermal/drivers/mediatek/lvts: Start sensor interrupts disabled thermal/drivers/mediatek/lvts: Only update IRQ enable for valid sensors
drivers/thermal/mediatek/lvts_thermal.c | 103 ++++++++++++++++++++++---------- 1 file changed, 72 insertions(+), 31 deletions(-) --- base-commit: d1486dca38afd08ca279ae94eb3a397f10737824 change-id: 20241121-mt8192-lvts-filtered-suspend-fix-a5032ca8eceb
Best regards,
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
Disable the temperature monitoring and give some time for the signals to stabilize during suspend in order to prevent such spurious interrupts.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: 8137bb90600d ("thermal/drivers/mediatek/lvts_thermal: Add suspend and resume") Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- drivers/thermal/mediatek/lvts_thermal.c | 36 +++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 07f7f3b7a2fb569cfc300dc2126ea426e161adff..a1a438ebad33c1fff8ca9781e12ef9e278eef785 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -860,6 +860,32 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, return 0; }
+static void lvts_ctrl_monitor_enable(struct device *dev, struct lvts_ctrl *lvts_ctrl, bool enable) +{ + /* + * Bitmaps to enable each sensor on filtered mode in the MONCTL0 + * register. + */ + static const u8 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) }; + u32 sensor_map = 0; + int i; + + if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE) + return; + + if (enable) { + lvts_for_each_valid_sensor(i, lvts_ctrl) + sensor_map |= sensor_filt_bitmap[i]; + } + + /* + * Bits: + * 9: Single point access flow + * 0-3: Enable sensing point 0-3 + */ + writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base)); +} + /* * At this point the configuration register is the only place in the * driver where we write multiple values. Per hardware constraint, @@ -1381,8 +1407,11 @@ static int lvts_suspend(struct device *dev)
lvts_td = dev_get_drvdata(dev);
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) + for (i = 0; i < lvts_td->num_lvts_ctrl; i++) { + lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], false); + usleep_range(100, 200); lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], false); + }
clk_disable_unprepare(lvts_td->clk);
@@ -1400,8 +1429,11 @@ static int lvts_resume(struct device *dev) if (ret) return ret;
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) + for (i = 0; i < lvts_td->num_lvts_ctrl; i++) { lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], true); + usleep_range(100, 200); + lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], true); + }
return 0; }
Hi Nicolas,
On 13/01/2025 14:27, Nícolas F. R. A. Prado wrote:
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
Disable the temperature monitoring and give some time for the signals to stabilize during suspend in order to prevent such spurious interrupts.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: 8137bb90600d ("thermal/drivers/mediatek/lvts_thermal: Add suspend and resume") Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com
drivers/thermal/mediatek/lvts_thermal.c | 36 +++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 07f7f3b7a2fb569cfc300dc2126ea426e161adff..a1a438ebad33c1fff8ca9781e12ef9e278eef785 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -860,6 +860,32 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, return 0; } +static void lvts_ctrl_monitor_enable(struct device *dev, struct lvts_ctrl *lvts_ctrl, bool enable) +{
- /*
* Bitmaps to enable each sensor on filtered mode in the MONCTL0
* register.
*/
- static const u8 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) };
- u32 sensor_map = 0;
- int i;
- if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE)
return;
- if (enable) {
lvts_for_each_valid_sensor(i, lvts_ctrl)
sensor_map |= sensor_filt_bitmap[i];
- }
- /*
* Bits:
* 9: Single point access flow
* 0-3: Enable sensing point 0-3
*/
- writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base));
+}
- /*
- At this point the configuration register is the only place in the
- driver where we write multiple values. Per hardware constraint,
@@ -1381,8 +1407,11 @@ static int lvts_suspend(struct device *dev) lvts_td = dev_get_drvdata(dev);
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++)
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) {
lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], false);
usleep_range(100, 200);
From where this delay is coming from ?
lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], false);
- }
clk_disable_unprepare(lvts_td->clk); @@ -1400,8 +1429,11 @@ static int lvts_resume(struct device *dev) if (ret) return ret;
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++)
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) { lvts_ctrl_set_enable(&lvts_td->lvts_ctrl[i], true);
usleep_range(100, 200);
lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], true);
- }
return 0; }
On Tue, Jan 14, 2025 at 10:23:42AM +0100, Daniel Lezcano wrote:
Hi Nicolas,
On 13/01/2025 14:27, Nícolas F. R. A. Prado wrote:
When configured in filtered mode, the LVTS thermal controller will monitor the temperature from the sensors and trigger an interrupt once a thermal threshold is crossed.
Currently this is true even during suspend and resume. The problem with that is that when enabling the internal clock of the LVTS controller in lvts_ctrl_set_enable() during resume, the temperature reading can glitch and appear much higher than the real one, resulting in a spurious interrupt getting generated.
Disable the temperature monitoring and give some time for the signals to stabilize during suspend in order to prevent such spurious interrupts.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: 8137bb90600d ("thermal/drivers/mediatek/lvts_thermal: Add suspend and resume") Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com
drivers/thermal/mediatek/lvts_thermal.c | 36 +++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 07f7f3b7a2fb569cfc300dc2126ea426e161adff..a1a438ebad33c1fff8ca9781e12ef9e278eef785 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -860,6 +860,32 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, return 0; } +static void lvts_ctrl_monitor_enable(struct device *dev, struct lvts_ctrl *lvts_ctrl, bool enable) +{
- /*
* Bitmaps to enable each sensor on filtered mode in the MONCTL0
* register.
*/
- static const u8 sensor_filt_bitmap[] = { BIT(0), BIT(1), BIT(2), BIT(3) };
- u32 sensor_map = 0;
- int i;
- if (lvts_ctrl->mode != LVTS_MSR_FILTERED_MODE)
return;
- if (enable) {
lvts_for_each_valid_sensor(i, lvts_ctrl)
sensor_map |= sensor_filt_bitmap[i];
- }
- /*
* Bits:
* 9: Single point access flow
* 0-3: Enable sensing point 0-3
*/
- writel(sensor_map | BIT(9), LVTS_MONCTL0(lvts_ctrl->base));
+}
- /*
- At this point the configuration register is the only place in the
- driver where we write multiple values. Per hardware constraint,
@@ -1381,8 +1407,11 @@ static int lvts_suspend(struct device *dev) lvts_td = dev_get_drvdata(dev);
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++)
- for (i = 0; i < lvts_td->num_lvts_ctrl; i++) {
lvts_ctrl_monitor_enable(dev, &lvts_td->lvts_ctrl[i], false);
usleep_range(100, 200);
From where this delay is coming from ?
That's empirical. I tested several times doing system suspend and resume on the machines hooked to our lab and that was the minimum delay I could find that still never resulted in the spurious readings.
Unfortunately the technical documentation I have access to never even mentioned that this issue could arise, let alone what the timing constraints were, so this had to be figured out empirically.
Thanks, Nícolas
The Stage 3 thermal threshold is currently configured during the controller initialization to 105 Celsius. From the kernel perspective, this configuration is harmful because: * The stage 3 interrupt that gets triggered when the threshold is crossed is not handled in any way by the IRQ handler, it just gets cleared. Besides, the temperature used for stage 3 comes from the sensors, and the critical thermal trip points described in the Devicetree will already cause a shutdown when crossed (at a lower temperature, of 100 Celsius, for all SoCs currently using this driver). * The only effect of crossing the stage 3 threshold that has been observed is that it causes the machine to no longer be able to enter suspend. Even if that was a result of a momentary glitch in the temperature reading of a sensor (as has been observed on the MT8192-based Chromebooks).
For those reasons, disable the Stage 3 thermal threshold configuration.
Cc: stable@vger.kernel.org Reported-by: Hsin-Te Yuan yuanhsinte@chromium.org Closes: https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/ Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver") Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- drivers/thermal/mediatek/lvts_thermal.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index a1a438ebad33c1fff8ca9781e12ef9e278eef785..0aaa44b734ca43e6abfd97b2ca4ce34dc6f15826 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -65,7 +65,7 @@ #define LVTS_HW_FILTER 0x0 #define LVTS_TSSEL_CONF 0x13121110 #define LVTS_CALSCALE_CONF 0x300 -#define LVTS_MONINT_CONF 0x8300318C +#define LVTS_MONINT_CONF 0x0300318C
#define LVTS_MONINT_OFFSET_SENSOR0 0xC #define LVTS_MONINT_OFFSET_SENSOR1 0x180 @@ -91,8 +91,6 @@ #define LVTS_MSR_READ_TIMEOUT_US 400 #define LVTS_MSR_READ_WAIT_US (LVTS_MSR_READ_TIMEOUT_US / 2)
-#define LVTS_HW_TSHUT_TEMP 105000 - #define LVTS_MINIMUM_THRESHOLD 20000
static int golden_temp = LVTS_GOLDEN_TEMP_DEFAULT; @@ -145,7 +143,6 @@ struct lvts_ctrl { struct lvts_sensor sensors[LVTS_SENSOR_MAX]; const struct lvts_data *lvts_data; u32 calibration[LVTS_SENSOR_MAX]; - u32 hw_tshut_raw_temp; u8 valid_sensor_mask; int mode; void __iomem *base; @@ -837,14 +834,6 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, */ lvts_ctrl[i].mode = lvts_data->lvts_ctrl[i].mode;
- /* - * The temperature to raw temperature must be done - * after initializing the calibration. - */ - lvts_ctrl[i].hw_tshut_raw_temp = - lvts_temp_to_raw(LVTS_HW_TSHUT_TEMP, - lvts_data->temp_factor); - lvts_ctrl[i].low_thresh = INT_MIN; lvts_ctrl[i].high_thresh = INT_MIN; } @@ -919,7 +908,6 @@ static int lvts_irq_init(struct lvts_ctrl *lvts_ctrl) * 10 : Selected sensor with bits 19-18 * 11 : Reserved */ - writel(BIT(16), LVTS_PROTCTL(lvts_ctrl->base));
/* * LVTS_PROTTA : Stage 1 temperature threshold @@ -932,8 +920,8 @@ static int lvts_irq_init(struct lvts_ctrl *lvts_ctrl) * * writel(0x0, LVTS_PROTTA(lvts_ctrl->base)); * writel(0x0, LVTS_PROTTB(lvts_ctrl->base)); + * writel(0x0, LVTS_PROTTC(lvts_ctrl->base)); */ - writel(lvts_ctrl->hw_tshut_raw_temp, LVTS_PROTTC(lvts_ctrl->base));
/* * LVTS_MONINT : Interrupt configuration register
On 13/01/2025 14:27, Nícolas F. R. A. Prado wrote:
The Stage 3 thermal threshold is currently configured during the controller initialization to 105 Celsius. From the kernel perspective, this configuration is harmful because:
- The stage 3 interrupt that gets triggered when the threshold is crossed is not handled in any way by the IRQ handler, it just gets cleared. Besides, the temperature used for stage 3 comes from the sensors, and the critical thermal trip points described in the Devicetree will already cause a shutdown when crossed (at a lower temperature, of 100 Celsius, for all SoCs currently using this driver).
- The only effect of crossing the stage 3 threshold that has been observed is that it causes the machine to no longer be able to enter suspend. Even if that was a result of a momentary glitch in the temperature reading of a sensor (as has been observed on the MT8192-based Chromebooks).
For those reasons, disable the Stage 3 thermal threshold configuration.
Does this stage 3 not designed to reset the system ? So the interrupt line should be attached to the reset line ? (just asking)
On Tue, Jan 14, 2025 at 12:54:43PM +0100, Daniel Lezcano wrote:
On 13/01/2025 14:27, Nícolas F. R. A. Prado wrote:
The Stage 3 thermal threshold is currently configured during the controller initialization to 105 Celsius. From the kernel perspective, this configuration is harmful because:
- The stage 3 interrupt that gets triggered when the threshold is crossed is not handled in any way by the IRQ handler, it just gets cleared. Besides, the temperature used for stage 3 comes from the sensors, and the critical thermal trip points described in the Devicetree will already cause a shutdown when crossed (at a lower temperature, of 100 Celsius, for all SoCs currently using this driver).
- The only effect of crossing the stage 3 threshold that has been observed is that it causes the machine to no longer be able to enter suspend. Even if that was a result of a momentary glitch in the temperature reading of a sensor (as has been observed on the MT8192-based Chromebooks).
For those reasons, disable the Stage 3 thermal threshold configuration.
Does this stage 3 not designed to reset the system ? So the interrupt line should be attached to the reset line ? (just asking)
Yes, my guess is that the intention of stage 3 is to cause a system reset, however it clearly does not cause a system reset, so it is not directly connected to the reset line in any way. Instead it is up to the kernel to receive the interrupt and deal with it. But then, since there are lower thermal thresholds that already cause a system shutdown, it's useless to keep this around - it only causes suspend/resume misbehaviors in case there are spurious readings.
Thanks, Nícolas
In order to get working interrupts, a low offset value needs to be configured. The minimum value for it is 20 Celsius, which is what is configured when there's no lower thermal trip (ie the thermal core passes -INT_MAX as low trip temperature). However, when the temperature gets that low and fluctuates around that value it causes an interrupt storm.
Prevent that interrupt storm by not enabling the low offset interrupt if the low threshold is the minimum one.
Cc: stable@vger.kernel.org Fixes: 77354eaef821 ("thermal/drivers/mediatek/lvts_thermal: Don't leave threshold zeroed") Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com --- drivers/thermal/mediatek/lvts_thermal.c | 48 ++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 13 deletions(-)
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index 0aaa44b734ca43e6abfd97b2ca4ce34dc6f15826..04bfbfe93a71ee9e3428bfd7f8bd359fe9446e88 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -67,10 +67,14 @@ #define LVTS_CALSCALE_CONF 0x300 #define LVTS_MONINT_CONF 0x0300318C
-#define LVTS_MONINT_OFFSET_SENSOR0 0xC -#define LVTS_MONINT_OFFSET_SENSOR1 0x180 -#define LVTS_MONINT_OFFSET_SENSOR2 0x3000 -#define LVTS_MONINT_OFFSET_SENSOR3 0x3000000 +#define LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR0 BIT(3) +#define LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR1 BIT(8) +#define LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR2 BIT(13) +#define LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR3 BIT(25) +#define LVTS_MONINT_OFFSET_LOW_INTEN_SENSOR0 BIT(2) +#define LVTS_MONINT_OFFSET_LOW_INTEN_SENSOR1 BIT(7) +#define LVTS_MONINT_OFFSET_LOW_INTEN_SENSOR2 BIT(12) +#define LVTS_MONINT_OFFSET_LOW_INTEN_SENSOR3 BIT(24)
#define LVTS_INT_SENSOR0 0x0009001F #define LVTS_INT_SENSOR1 0x001203E0 @@ -326,11 +330,17 @@ static int lvts_get_temp(struct thermal_zone_device *tz, int *temp)
static void lvts_update_irq_mask(struct lvts_ctrl *lvts_ctrl) { - static const u32 masks[] = { - LVTS_MONINT_OFFSET_SENSOR0, - LVTS_MONINT_OFFSET_SENSOR1, - LVTS_MONINT_OFFSET_SENSOR2, - LVTS_MONINT_OFFSET_SENSOR3, + static const u32 high_offset_inten_masks[] = { + LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR0, + LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR1, + LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR2, + LVTS_MONINT_OFFSET_HIGH_INTEN_SENSOR3, + }; + static const u32 low_offset_inten_masks[] = { + LVTS_MONINT_OFFSET_LOW_INTEN_SENSOR0, + LVTS_MONINT_OFFSET_LOW_INTEN_SENSOR1, + LVTS_MONINT_OFFSET_LOW_INTEN_SENSOR2, + LVTS_MONINT_OFFSET_LOW_INTEN_SENSOR3, }; u32 value = 0; int i; @@ -339,10 +349,22 @@ static void lvts_update_irq_mask(struct lvts_ctrl *lvts_ctrl)
for (i = 0; i < ARRAY_SIZE(masks); i++) { if (lvts_ctrl->sensors[i].high_thresh == lvts_ctrl->high_thresh - && lvts_ctrl->sensors[i].low_thresh == lvts_ctrl->low_thresh) - value |= masks[i]; - else - value &= ~masks[i]; + && lvts_ctrl->sensors[i].low_thresh == lvts_ctrl->low_thresh) { + /* + * The minimum threshold needs to be configured in the + * OFFSETL register to get working interrupts, but we + * don't actually want to generate interrupts when + * crossing it. + */ + if (lvts_ctrl->low_thresh == -INT_MAX) { + value &= ~low_offset_inten_masks[i]; + value |= high_offset_inten_masks[i]; + } else { + value |= low_offset_inten_masks[i] | high_offset_inten_masks[i]; + } + } else { + value &= ~(low_offset_inten_masks[i] | high_offset_inten_masks[i]); + } }
writel(value, LVTS_MONINT(lvts_ctrl->base));
On 13/01/2025 14:27, Nícolas F. R. A. Prado wrote:
In order to get working interrupts, a low offset value needs to be configured. The minimum value for it is 20 Celsius, which is what is configured when there's no lower thermal trip (ie the thermal core passes -INT_MAX as low trip temperature). However, when the temperature gets that low and fluctuates around that value it causes an interrupt storm.
Is it really about an irq storm or about having a temperature threshold set close to the ambiant temperature. So leading to unnecessary wakeups as there is need for mitigation ?
Prevent that interrupt storm by not enabling the low offset interrupt if the low threshold is the minimum one.
The case where the high threshold is the INT_MAX should be handled too. The system may have configured a thermal zone without critical trip points, so setting the next upper threshold will program the register with INT_MAX. I guess it is an undefined behavior in this case, right ?
Cc: stable@vger.kernel.org
[ ... ]
On Tue, Jan 14, 2025 at 07:30:31PM +0100, Daniel Lezcano wrote:
On 13/01/2025 14:27, Nícolas F. R. A. Prado wrote:
In order to get working interrupts, a low offset value needs to be configured. The minimum value for it is 20 Celsius, which is what is configured when there's no lower thermal trip (ie the thermal core passes -INT_MAX as low trip temperature). However, when the temperature gets that low and fluctuates around that value it causes an interrupt storm.
Is it really about an irq storm or about having a temperature threshold set close to the ambiant temperature. So leading to unnecessary wakeups as there is need for mitigation ?
Yes, that's what I mean. The irq threshold gets configured to 20C, so whenever the temperature drops below that value, the IRQ gets triggered. But this usually does not happen just once, because from the thermal frameworks' perspective, there's no thermal threshold configured for 20C, since that's done from the driver, the framework thinks it's -INT_MAX, so the threshold doesn't get moved after the trigger and it just ends up triggering hundreds or thousands of times in a short span of time, hence why I say it's an interrupt storm.
Prevent that interrupt storm by not enabling the low offset interrupt if the low threshold is the minimum one.
The case where the high threshold is the INT_MAX should be handled too. The system may have configured a thermal zone without critical trip points, so setting the next upper threshold will program the register with INT_MAX. I guess it is an undefined behavior in this case, right ?
Ah, yes, I don't think I've tested that before... I'll test it and send a fix if needed.
Thanks, Nícolas
linux-stable-mirror@lists.linaro.org