From: Lijo Lazar lijo.lazar@amd.com
[ Upstream commit 2f3b1ccf83be83a3330e38194ddfd1a91fec69be ]
Cached metrics data validity is 1ms on arcturus. It's not reasonable for any client to query gpu_metrics at a faster rate and constantly interrupt PMFW.
Signed-off-by: Lijo Lazar lijo.lazar@amd.com Reviewed-by: Asad Kamal asad.kamal@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What changed - In `drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c:2327`, `arcturus_get_gpu_metrics()` currently calls `smu_cmn_get_metrics_table(smu, &metrics, true);`, i.e., it always bypasses the cache and forces a PMFW interaction. The commit flips the third argument to `false`, switching to the existing 1 ms cache. - The callee’s API explicitly defines the third parameter as `bypass_cache` (see `drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h:122`), so `false` means “use cached metrics.” - The common metrics helper implements a 1 ms cache window (see `drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c:1013-1041`), only fetching fresh data if either explicitly bypassed or the last fetch is older than 1 ms. The per-ASIC table initialization sets `metrics_time = 0`, ensuring the first call still fetches fresh metrics (see `drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c:274`).
- Why it matters (bug-like behavior and user impact) - The current arcturus path always bypasses the cache on every `gpu_metrics` query (see `arcturus_ppt.c:2327-2329`), needlessly interrupting PMFW for callers that poll frequently. The commit message states cache validity is 1 ms on arcturus and that frequent queries “constantly interrupt PMFW,” which is undesirable and can degrade performance or reliability. - Using the cache still guarantees data freshness within 1 ms and avoids spamming PMFW when clients poll faster than that. From user space, the only observable difference is that very high-rate queries (>1 kHz) won’t force a new PMFW read each time; metrics can be up to 1 ms old. This aligns with the existing caching design and does not change the ABI or data layout returned by `gpu_metrics` (the rest of the function remains unchanged; e.g., field population and `system_clock_counter` at `drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c:2373`).
- Scope and risk assessment - Change is one boolean flip in a single ASIC-specific path, no architectural changes, no cross-subsystem effects. - It leverages existing, well-tested caching in `smu_cmn_get_metrics_table()`; behavior falls back to the exact update path after at most 1 ms (`smu_cmn.c:1022-1035`). - Initial fetch correctness is preserved because `metrics_time` starts at zero (`arcturus_ppt.c:274`), so the first call is always fresh. - Potential side effects are minimal: clients polling at sub- millisecond rates may see identical metrics across calls within a 1 ms window, which is explicitly intended by the caching policy and called out in the commit rationale.
- Stable backport criteria - Fixes a real, user-facing problem: unnecessary PMFW interruptions from high-frequency polling, which can affect performance and system behavior. - The change is small, contained, and low risk (one-argument change). - No new features or ABI changes; it aligns arcturus with the existing 1 ms caching policy implemented in common code. - No major architectural changes or complex dependencies; the function signature and caching logic already exist in stable trees.
Given the minimal, targeted nature of the change, its alignment with existing caching semantics, and the clear benefit of reducing PMFW interruptions without altering user-visible interfaces, this is a good candidate for stable backporting.
drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c index 9ad46f545d15c..599eddb5a67d5 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c @@ -1897,7 +1897,7 @@ static ssize_t arcturus_get_gpu_metrics(struct smu_context *smu,
ret = smu_cmn_get_metrics_table(smu, &metrics, - true); + false); if (ret) return ret;