From: David Collins david.collins@oss.qualcomm.com
[ Upstream commit f8e157ff2df46ddabd930815d196895976227831 ]
Certain TEMP_ALARM GEN2 PMIC peripherals need over-temperature stage 2 automatic PMIC partial shutdown. This will ensure that in the event of reaching the hotter stage 3 over-temperature threshold, repeated faults will be avoided during the automatic PMIC hardware full shutdown. Modify the stage 2 shutdown control logic to ensure that stage 2 shutdown is enabled on all affected PMICs. Read the digital major and minor revision registers to identify these PMICs.
Signed-off-by: David Collins david.collins@oss.qualcomm.com Signed-off-by: Anjelique Melendez anjelique.melendez@oss.qualcomm.com Link: https://lore.kernel.org/r/20250710224555.3047790-2-anjelique.melendez@oss.qu... Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my detailed analysis:
## Key Reasons for Backporting:
### 1. **Fixes a Hardware Safety Issue** The commit addresses a critical hardware protection mechanism where certain PMIC peripherals require stage 2 automatic partial shutdown to prevent "repeated faults" during stage 3 over-temperature events. Without this fix, affected hardware could experience repeated thermal faults during critical temperature conditions, potentially leading to system instability or hardware damage.
### 2. **Targeted Hardware-Specific Fix** The code changes identify specific PMIC revisions that require this protection: ```c switch (dig_revision) { case 0x0001: case 0x0002: case 0x0100: case 0x0101: chip->require_stage2_shutdown = true; break; } ``` This shows it's a targeted fix for known hardware issues, not a general enhancement.
### 3. **Small and Contained Change** The fix is minimal and self-contained: - Adds reading of DIG_MINOR register - Adds a `require_stage2_shutdown` flag to the chip structure - Modifies the logic in `qpnp_tm_update_critical_trip_temp()` to respect this flag - Total change is about 30 lines of code with clear boundaries
### 4. **Low Risk of Regression** - The change only affects specific PMIC revisions (0x0001, 0x0002, 0x0100, 0x0101) - For other hardware, behavior remains unchanged - The modification is purely additive - it doesn't alter existing functionality for unaffected devices
### 5. **Prevents Hardware Malfunction** The commit message explicitly states this fix is needed to "avoid potential repeated faults upon reaching over-temperature stage 3." This indicates a real hardware malfunction scenario that users could encounter without this patch.
### 6. **Follows Stable Tree Rules** According to stable kernel rules, this qualifies because it: - Fixes a real bug (hardware protection failure) - Is not a new feature - Has minimal changes - Addresses a specific hardware issue that affects users
### Technical Analysis: The key change is in the `qpnp_tm_update_critical_trip_temp()` function where the condition is modified from: ```c if (disable_s2_shutdown) reg |= SHUTDOWN_CTRL1_OVERRIDE_S2; ``` to: ```c if (disable_stage2_shutdown && !chip->require_stage2_shutdown) reg |= SHUTDOWN_CTRL1_OVERRIDE_STAGE2; ```
This ensures that for the affected PMIC revisions, stage 2 shutdown is never disabled, providing the necessary hardware protection against thermal events.
Given that this fixes a hardware-specific thermal protection issue that could lead to system instability or damage, and the fix is minimal with low regression risk, this commit is an excellent candidate for stable backporting.
drivers/thermal/qcom/qcom-spmi-temp-alarm.c | 43 ++++++++++++++++----- 1 file changed, 34 insertions(+), 9 deletions(-)
diff --git a/drivers/thermal/qcom/qcom-spmi-temp-alarm.c b/drivers/thermal/qcom/qcom-spmi-temp-alarm.c index a81e7d6e865f..4b91cc13ce34 100644 --- a/drivers/thermal/qcom/qcom-spmi-temp-alarm.c +++ b/drivers/thermal/qcom/qcom-spmi-temp-alarm.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only /* * Copyright (c) 2011-2015, 2017, 2020, The Linux Foundation. All rights reserved. + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries. */
#include <linux/bitops.h> @@ -16,6 +17,7 @@
#include "../thermal_hwmon.h"
+#define QPNP_TM_REG_DIG_MINOR 0x00 #define QPNP_TM_REG_DIG_MAJOR 0x01 #define QPNP_TM_REG_TYPE 0x04 #define QPNP_TM_REG_SUBTYPE 0x05 @@ -31,7 +33,7 @@ #define STATUS_GEN2_STATE_MASK GENMASK(6, 4) #define STATUS_GEN2_STATE_SHIFT 4
-#define SHUTDOWN_CTRL1_OVERRIDE_S2 BIT(6) +#define SHUTDOWN_CTRL1_OVERRIDE_STAGE2 BIT(6) #define SHUTDOWN_CTRL1_THRESHOLD_MASK GENMASK(1, 0)
#define SHUTDOWN_CTRL1_RATE_25HZ BIT(3) @@ -78,6 +80,7 @@ struct qpnp_tm_chip { /* protects .thresh, .stage and chip registers */ struct mutex lock; bool initialized; + bool require_stage2_shutdown;
struct iio_channel *adc; const long (*temp_map)[THRESH_COUNT][STAGE_COUNT]; @@ -220,13 +223,13 @@ static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip, { long stage2_threshold_min = (*chip->temp_map)[THRESH_MIN][1]; long stage2_threshold_max = (*chip->temp_map)[THRESH_MAX][1]; - bool disable_s2_shutdown = false; + bool disable_stage2_shutdown = false; u8 reg;
WARN_ON(!mutex_is_locked(&chip->lock));
/* - * Default: S2 and S3 shutdown enabled, thresholds at + * Default: Stage 2 and Stage 3 shutdown enabled, thresholds at * lowest threshold set, monitoring at 25Hz */ reg = SHUTDOWN_CTRL1_RATE_25HZ; @@ -241,12 +244,12 @@ static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip, chip->thresh = THRESH_MAX - ((stage2_threshold_max - temp) / TEMP_THRESH_STEP); - disable_s2_shutdown = true; + disable_stage2_shutdown = true; } else { chip->thresh = THRESH_MAX;
if (chip->adc) - disable_s2_shutdown = true; + disable_stage2_shutdown = true; else dev_warn(chip->dev, "No ADC is configured and critical temperature %d mC is above the maximum stage 2 threshold of %ld mC! Configuring stage 2 shutdown at %ld mC.\n", @@ -255,8 +258,8 @@ static int qpnp_tm_update_critical_trip_temp(struct qpnp_tm_chip *chip,
skip: reg |= chip->thresh; - if (disable_s2_shutdown) - reg |= SHUTDOWN_CTRL1_OVERRIDE_S2; + if (disable_stage2_shutdown && !chip->require_stage2_shutdown) + reg |= SHUTDOWN_CTRL1_OVERRIDE_STAGE2;
return qpnp_tm_write(chip, QPNP_TM_REG_SHUTDOWN_CTRL1, reg); } @@ -350,8 +353,8 @@ static int qpnp_tm_probe(struct platform_device *pdev) { struct qpnp_tm_chip *chip; struct device_node *node; - u8 type, subtype, dig_major; - u32 res; + u8 type, subtype, dig_major, dig_minor; + u32 res, dig_revision; int ret, irq;
node = pdev->dev.of_node; @@ -402,6 +405,11 @@ static int qpnp_tm_probe(struct platform_device *pdev) return dev_err_probe(&pdev->dev, ret, "could not read dig_major\n");
+ ret = qpnp_tm_read(chip, QPNP_TM_REG_DIG_MINOR, &dig_minor); + if (ret < 0) + return dev_err_probe(&pdev->dev, ret, + "could not read dig_minor\n"); + if (type != QPNP_TM_TYPE || (subtype != QPNP_TM_SUBTYPE_GEN1 && subtype != QPNP_TM_SUBTYPE_GEN2)) { dev_err(&pdev->dev, "invalid type 0x%02x or subtype 0x%02x\n", @@ -415,6 +423,23 @@ static int qpnp_tm_probe(struct platform_device *pdev) else chip->temp_map = &temp_map_gen1;
+ if (chip->subtype == QPNP_TM_SUBTYPE_GEN2) { + dig_revision = (dig_major << 8) | dig_minor; + /* + * Check if stage 2 automatic partial shutdown must remain + * enabled to avoid potential repeated faults upon reaching + * over-temperature stage 3. + */ + switch (dig_revision) { + case 0x0001: + case 0x0002: + case 0x0100: + case 0x0101: + chip->require_stage2_shutdown = true; + break; + } + } + /* * Register the sensor before initializing the hardware to be able to * read the trip points. get_temp() returns the default temperature