From: Aaron Plattner aplattner@nvidia.com
[ Upstream commit 48defdf6b083f74a44e1f742db284960d3444aec ]
The MediaTek implementation of the sbsa_gwdt watchdog has a race condition where a write to SBSA_GWDT_WRR is ignored if it occurs while the hardware is processing a timeout refresh that asserts WS0.
Detect this based on the hardware implementer and adjust wdd->min_hw_heartbeat_ms to avoid the race by forcing the keepalive ping to be one second later.
Signed-off-by: Aaron Plattner aplattner@nvidia.com Acked-by: Timur Tabi ttabi@nvidia.com Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20250721230640.2244915-1-aplattner@nvidia.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Nature of the Fix
This is a **hardware-specific bug fix** that addresses a race condition in MediaTek's implementation of the SBSA generic watchdog. The race occurs when a watchdog keepalive ping (`writel(0, gwdt->refresh_base + SBSA_GWDT_WRR)`) happens simultaneously with the hardware asserting the WS0 timeout signal. In affected hardware, this write is silently ignored, potentially leading to unexpected system resets.
## Key Code Changes Analysis
1. **Hardware Detection** (lines 78-79, 217-219): - Adds `SBSA_GWDT_IMPL_MEDIATEK` constant (0x426) to identify MediaTek hardware - Reads hardware implementer ID from `SBSA_GWDT_W_IIDR` register - Sets `need_ws0_race_workaround` flag only for MediaTek implementations when `action=0`
2. **Workaround Implementation** (lines 171-191): - Adjusts `wdd->min_hw_heartbeat_ms` to `timeout * 500 + 1000` - This forces keepalive pings to occur 1 second after the WOR timeout - Avoids the exact moment when WS0 is being asserted
3. **Minimum Timeout Adjustment** (lines 337-343): - Sets minimum timeout to 3 seconds for affected hardware - Ensures WOR is set to 1.5 seconds with heartbeats every 2.5 seconds
## Why This Qualifies for Stable Backport
1. **Fixes a Real Bug**: Addresses a hardware race condition that can cause **unexpected system resets** - a critical reliability issue in production systems.
2. **Limited Scope**: - Only affects MediaTek SBSA watchdog implementations - Workaround is conditionally applied based on hardware detection - No impact on other implementations
3. **Minimal Risk**: - Changes are confined to timing adjustments - No architectural changes or new features - Preserves existing behavior for all non-MediaTek hardware
4. **Clear Hardware Issue**: The commit message explicitly describes the race condition where "a write to SBSA_GWDT_WRR is ignored if it occurs while the hardware is processing a timeout refresh that asserts WS0."
5. **Production Impact**: Without this fix, systems with MediaTek SBSA watchdog can experience spurious reboots when the race condition is triggered, affecting system availability.
6. **Follows Stable Rules**: - Fixes an actual bug (not adding features) - Small, contained change (~50 lines) - Already reviewed by subsystem maintainers - Hardware-specific fix with clear problem/solution
## Comparison with Similar Commits
Looking at recent watchdog commits that were backported: - `c6e89348fd58`: Fixed incorrect timeout calculations - backported with Cc: stable - `041fdbe73de9`: Fixed timeout programming limits - backported to stable
This commit follows the same pattern of fixing hardware-specific bugs that affect system reliability.
The fix is essential for MediaTek-based systems using SBSA watchdog to prevent unexpected reboots caused by the hardware race condition.
drivers/watchdog/sbsa_gwdt.c | 50 +++++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 3 deletions(-)
diff --git a/drivers/watchdog/sbsa_gwdt.c b/drivers/watchdog/sbsa_gwdt.c index 5f23913ce3b4..6ce1bfb39064 100644 --- a/drivers/watchdog/sbsa_gwdt.c +++ b/drivers/watchdog/sbsa_gwdt.c @@ -75,11 +75,17 @@ #define SBSA_GWDT_VERSION_MASK 0xF #define SBSA_GWDT_VERSION_SHIFT 16
+#define SBSA_GWDT_IMPL_MASK 0x7FF +#define SBSA_GWDT_IMPL_SHIFT 0 +#define SBSA_GWDT_IMPL_MEDIATEK 0x426 + /** * struct sbsa_gwdt - Internal representation of the SBSA GWDT * @wdd: kernel watchdog_device structure * @clk: store the System Counter clock frequency, in Hz. * @version: store the architecture version + * @need_ws0_race_workaround: + * indicate whether to adjust wdd->timeout to avoid a race with WS0 * @refresh_base: Virtual address of the watchdog refresh frame * @control_base: Virtual address of the watchdog control frame */ @@ -87,6 +93,7 @@ struct sbsa_gwdt { struct watchdog_device wdd; u32 clk; int version; + bool need_ws0_race_workaround; void __iomem *refresh_base; void __iomem *control_base; }; @@ -161,6 +168,31 @@ static int sbsa_gwdt_set_timeout(struct watchdog_device *wdd, */ sbsa_gwdt_reg_write(((u64)gwdt->clk / 2) * timeout, gwdt);
+ /* + * Some watchdog hardware has a race condition where it will ignore + * sbsa_gwdt_keepalive() if it is called at the exact moment that a + * timeout occurs and WS0 is being asserted. Unfortunately, the default + * behavior of the watchdog core is very likely to trigger this race + * when action=0 because it programs WOR to be half of the desired + * timeout, and watchdog_next_keepalive() chooses the exact same time to + * send keepalive pings. + * + * This triggers a race where sbsa_gwdt_keepalive() can be called right + * as WS0 is being asserted, and affected hardware will ignore that + * write and continue to assert WS0. After another (timeout / 2) + * seconds, the same race happens again. If the driver wins then the + * explicit refresh will reset WS0 to false but if the hardware wins, + * then WS1 is asserted and the system resets. + * + * Avoid the problem by scheduling keepalive heartbeats one second later + * than the WOR timeout. + * + * This workaround might not be needed in a future revision of the + * hardware. + */ + if (gwdt->need_ws0_race_workaround) + wdd->min_hw_heartbeat_ms = timeout * 500 + 1000; + return 0; }
@@ -202,12 +234,15 @@ static int sbsa_gwdt_keepalive(struct watchdog_device *wdd) static void sbsa_gwdt_get_version(struct watchdog_device *wdd) { struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd); - int ver; + int iidr, ver, impl;
- ver = readl(gwdt->control_base + SBSA_GWDT_W_IIDR); - ver = (ver >> SBSA_GWDT_VERSION_SHIFT) & SBSA_GWDT_VERSION_MASK; + iidr = readl(gwdt->control_base + SBSA_GWDT_W_IIDR); + ver = (iidr >> SBSA_GWDT_VERSION_SHIFT) & SBSA_GWDT_VERSION_MASK; + impl = (iidr >> SBSA_GWDT_IMPL_SHIFT) & SBSA_GWDT_IMPL_MASK;
gwdt->version = ver; + gwdt->need_ws0_race_workaround = + !action && (impl == SBSA_GWDT_IMPL_MEDIATEK); }
static int sbsa_gwdt_start(struct watchdog_device *wdd) @@ -299,6 +334,15 @@ static int sbsa_gwdt_probe(struct platform_device *pdev) else wdd->max_hw_heartbeat_ms = GENMASK_ULL(47, 0) / gwdt->clk * 1000;
+ if (gwdt->need_ws0_race_workaround) { + /* + * A timeout of 3 seconds means that WOR will be set to 1.5 + * seconds and the heartbeat will be scheduled every 2.5 + * seconds. + */ + wdd->min_timeout = 3; + } + status = readl(cf_base + SBSA_GWDT_WCS); if (status & SBSA_GWDT_WCS_WS1) { dev_warn(dev, "System reset by WDT.\n");