From: Lijo Lazar lijo.lazar@amd.com
[ Upstream commit e87577ef6daa0cfb10ca139c720f0c57bd894174 ]
Cached metrics data validity is 1ms on aldebaran. It's not reasonable for any client to query gpu_metrics at a faster rate and constantly interrupt PMFW.
Signed-off-by: Lijo Lazar lijo.lazar@amd.com Reviewed-by: Asad Kamal asad.kamal@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
Rationale - What changed: In `drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c:1717`, the call `smu_cmn_get_metrics_table(smu, &metrics, true)` is switched to `... false`. This flips the `bypass_cache` flag so Aldebaran’s `aldebaran_get_gpu_metrics()` uses the cached metrics instead of forcing a fresh PMFW query every time. - Cache semantics: `smu_cmn_get_metrics_table()` caches SMU metrics for 1 ms and refreshes only if the cache is older or bypassed. See `drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c:1023` (1 ms validity), `...:1034` (updates and timestamps cache). - Consistency with existing Aldebaran paths: Other Aldebaran helpers already use the cached path, e.g. `aldebaran_get_smu_metrics_data()` calls `smu_cmn_get_metrics_table(smu, NULL, false)` to reuse cached metrics (drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c:618). This change makes `get_gpu_metrics` consistent with those helpers. - Why it matters: Forcing fresh metrics on every `gpu_metrics` read causes frequent SMU/PMFW interactions. On Aldebaran, cached metrics are valid for 1 ms (as the commit message notes). Using the cache avoids needless PMFW interrupts when clients poll faster than 1 kHz, improving firmware responsiveness and reducing overhead. The returned data can at most be 1 ms old, which is within the defined validity window.
Risk and scope - Minimal change, localized to Aldebaran: One boolean flip in an Aldebaran-specific function; no architectural or API changes; no cross-subsystem impact. - Behavior impact is bounded: Only affects callers that poll faster than 1 ms; they now see properly cached values (up to 1 ms old) rather than forcing a fresh read. This matches the established 1 ms cache policy in `smu_cmn_get_metrics_table`. - Safe initialization: Metrics cache is initialized to 0 so the first fetch always refreshes (drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c:250). - No security or correctness regressions: Reading slightly-cached telemetry is expected and already used elsewhere; avoids performance pitfalls from excessive PMFW interrupts.
Stable backport criteria - Fixes a real-world issue (excessive PMFW interrupts / overhead under high-frequency polling) that can affect users. - Small, contained change with low regression risk. - No new features or ABI changes; aligns behavior with existing cache policy and other Aldebaran code paths. - Touches a single driver component without architectural refactoring.
Given the narrow scope, clear benefit, and low risk, this is a good candidate for stable backport.
drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c index c63d2e28954d0..b067147b7c41f 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c @@ -1781,7 +1781,7 @@ static ssize_t aldebaran_get_gpu_metrics(struct smu_context *smu,
ret = smu_cmn_get_metrics_table(smu, &metrics, - true); + false); if (ret) return ret;