From: Markus Stockhausen markus.stockhausen@gmx.de
[ Upstream commit e7a25106335041aeca4fdf50a84804c90142c886 ]
The OpenWrt distribution has switched from kernel longterm 6.6 to 6.12. Reports show that devices with the Realtek Otto switch platform die during operation and are rebooted by the watchdog. Sorting out other possible reasons the Otto timer is to blame. The platform currently consists of 4 targets with different hardware revisions. It is not 100% clear which devices and revisions are affected.
Analysis shows:
A more aggressive sched/deadline handling leads to more timer starts with small intervals. This increases the bug chances. See https://marc.info/?l=linux-kernel&m=175276556023276&w=2
Focusing on the real issue a hardware limitation on some devices was found. There is a minimal chance that a timer ends without firing an interrupt if it is reprogrammed within the 5us before its expiration time. Work around this issue by introducing a bounce() function. It restarts the timer directly before the normal restart functions as follows:
- Stop timer - Restart timer with a slow frequency. - Target time will be >5us - The subsequent normal restart is outside the critical window
Downstream has already tested and confirmed a patch. See https://github.com/openwrt/openwrt/pull/19468 https://forum.openwrt.org/t/support-for-rtl838x-based-managed-switches/57875...
Signed-off-by: Markus Stockhausen markus.stockhausen@gmx.de Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Tested-by: Stephen Howell howels@allthatwemight.be Tested-by: Bjørn Mork bjorn@mork.no Link: https://lore.kernel.org/r/20250804080328.2609287-2-markus.stockhausen@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- Fixes a real user-visible bug: The commit addresses a hardware timing erratum on some Realtek Otto SoCs where reprogramming a running timer within ~5 µs of its expiration can cause the next interrupt to be lost, leading to hangs and watchdog resets. This is confirmed by field reports (OpenWrt) and multiple Tested-by tags in the commit message. It is an important reliability fix, not a feature.
- Small, contained change in one driver: The patch is limited to `drivers/clocksource/timer-rtl-otto.c`. It introduces a minimal helper and three call sites; no API or architectural changes.
- Core idea and code changes: - Adds `RTTM_MAX_DIVISOR` to select the slowest prescaler for a brief “bounce” restart to safely move the timer away from the <5 µs danger window (define added near the other timer constants). - Introduces `rttm_bounce_timer()` which disables and immediately re- enables the timer with the slowest divisor, preserving the current period so the immediate follow-up reprogramming happens well outside the critical window: - New helper is placed after the IRQ handler and before the stop/start helpers. - Wires the bounce into all clockevent reprogram paths by calling it just before the existing stop/program/start sequence: - `rttm_next_event()` adds the bounce before `rttm_stop_timer()` (see current function start at drivers/clocksource/timer-rtl- otto.c:127). - `rttm_state_oneshot()` adds the bounce before `rttm_stop_timer()` (drivers/clocksource/timer-rtl-otto.c:139). - `rttm_state_periodic()` adds the bounce before `rttm_stop_timer()` (drivers/clocksource/timer-rtl-otto.c:151). - The clocksource path remains untouched (e.g., `rttm_enable_clocksource()` at drivers/clocksource/timer-rtl- otto.c:204), which is appropriate since the bug is triggered by frequent reprogramming of the clockevent timers, not the continuous clocksource.
- Rationale for safety and effectiveness: - The bounce sequence is purely local to the Otto timer MMIO block and uses existing primitives (`rttm_disable_timer()`, `rttm_enable_timer()`), preserving established semantics while creating a safe temporal margin before the normal reprogramming. - Using `RTTM_MAX_DIVISOR` ensures the effective tick frequency drops to ~kHz, making the “time to end marker” well beyond 5 µs even with the minimal period (`>= RTTM_MIN_DELTA`, 8 ticks), eliminating the observed race window. - The stop/ack/program/start logic remains identical aside from the pre-amble bounce; ack of pending IRQs is still done in `rttm_stop_timer()`, as before, so the change does not introduce new interrupt handling semantics.
- Scope and regression risk: - Limited to Realtek Otto timer driver; no impact on other platforms or subsystems. - No ABI/DT/Kconfig changes; no scheduling or generic timekeeping changes. - Minimal runtime overhead (a couple of MMIO writes per reprogram) is acceptable versus preventing system hangs.
- Stable backport suitability: - The driver is present in stable trees starting with v6.11 (verified: file exists in v6.11 and v6.12; drivers/clocksource/timer-rtl- otto.c). The bug has real-world impact with OpenWrt on 6.12; hence backporting to 6.11.y, 6.12.y, and newer stable series that include this driver is appropriate. - The patch is self-contained and does not depend on recent framework changes.
Conclusion: This is a targeted, low-risk workaround for a serious hardware erratum affecting deployed systems. It cleanly fits stable criteria and should be backported to all stable series that contain `drivers/clocksource/timer-rtl-otto.c`.
drivers/clocksource/timer-rtl-otto.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/clocksource/timer-rtl-otto.c b/drivers/clocksource/timer-rtl-otto.c index 8a3068b36e752..8be45a11fb8b6 100644 --- a/drivers/clocksource/timer-rtl-otto.c +++ b/drivers/clocksource/timer-rtl-otto.c @@ -38,6 +38,7 @@ #define RTTM_BIT_COUNT 28 #define RTTM_MIN_DELTA 8 #define RTTM_MAX_DELTA CLOCKSOURCE_MASK(28) +#define RTTM_MAX_DIVISOR GENMASK(15, 0)
/* * Timers are derived from the LXB clock frequency. Usually this is a fixed @@ -112,6 +113,22 @@ static irqreturn_t rttm_timer_interrupt(int irq, void *dev_id) return IRQ_HANDLED; }
+static void rttm_bounce_timer(void __iomem *base, u32 mode) +{ + /* + * When a running timer has less than ~5us left, a stop/start sequence + * might fail. While the details are unknown the most evident effect is + * that the subsequent interrupt will not be fired. + * + * As a workaround issue an intermediate restart with a very slow + * frequency of ~3kHz keeping the target counter (>=8). So the follow + * up restart will always be issued outside the critical window. + */ + + rttm_disable_timer(base); + rttm_enable_timer(base, mode, RTTM_MAX_DIVISOR); +} + static void rttm_stop_timer(void __iomem *base) { rttm_disable_timer(base); @@ -129,6 +146,7 @@ static int rttm_next_event(unsigned long delta, struct clock_event_device *clkev struct timer_of *to = to_timer_of(clkevt);
RTTM_DEBUG(to->of_base.base); + rttm_bounce_timer(to->of_base.base, RTTM_CTRL_COUNTER); rttm_stop_timer(to->of_base.base); rttm_set_period(to->of_base.base, delta); rttm_start_timer(to, RTTM_CTRL_COUNTER); @@ -141,6 +159,7 @@ static int rttm_state_oneshot(struct clock_event_device *clkevt) struct timer_of *to = to_timer_of(clkevt);
RTTM_DEBUG(to->of_base.base); + rttm_bounce_timer(to->of_base.base, RTTM_CTRL_COUNTER); rttm_stop_timer(to->of_base.base); rttm_set_period(to->of_base.base, RTTM_TICKS_PER_SEC / HZ); rttm_start_timer(to, RTTM_CTRL_COUNTER); @@ -153,6 +172,7 @@ static int rttm_state_periodic(struct clock_event_device *clkevt) struct timer_of *to = to_timer_of(clkevt);
RTTM_DEBUG(to->of_base.base); + rttm_bounce_timer(to->of_base.base, RTTM_CTRL_TIMER); rttm_stop_timer(to->of_base.base); rttm_set_period(to->of_base.base, RTTM_TICKS_PER_SEC / HZ); rttm_start_timer(to, RTTM_CTRL_TIMER);