From: Sathishkumar S sathishkumar.sundararaju@amd.com
[ Upstream commit 0e7581eda8c76d1ca4cf519631a4d4eb9f82b94c ]
Acquire jpeg_pg_lock before changes to jpeg power state and release it after power off from idle work handler.
Signed-off-by: Sathishkumar S sathishkumar.sundararaju@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes - The idle path gated JPEG power without holding the JPEG power-gating mutex, while the active path already ungated under that mutex. This asymmetry creates a race between idle gating and new work starting, risking incorrect hardware state. - In `drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c:121`, the idle worker currently gates power via `amdgpu_device_ip_set_powergating_state(..., AMD_PG_STATE_GATE)` without any lock. The commit wraps this call with `mutex_lock(&adev->jpeg.jpeg_pg_lock)`/`mutex_unlock(...)`, making it consistent with the active path. - The active path already uses the same lock when ungating: `drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c:134` (lock), `amdgpu_jpeg.c:135-136` (UNGATE), `amdgpu_jpeg.c:137` (unlock). The commit brings the idle gating path to the same locking discipline.
- Why the race matters - JPEG IP set_powergating handlers update a shared state (`adev->jpeg.cur_state`) without internal locking and short-circuit based on it: - See `drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c:661` (return early if desired state equals `cur_state`) and `jpeg_v4_0.c:670` (write `cur_state` after a successful transition). Other JPEG versions follow the same pattern. - Without synchronization, idle gating and active ungating can interleave such that: - One thread may read an outdated `cur_state` and skip a necessary transition. - Another may update `cur_state` last, leaving the software view mismatched with actual hardware state. - Practical impact includes power-gating the JPEG block while it’s about to be used or is in use, leading to decode failures, ring stalls/timeouts, or sporadic hangs when work arrives near idle transitions.
- Scope and risk - Small, contained change in a single file (`amdgpu_jpeg.c`) affecting only the JPEG idle work handler and using an already-present mutex (`amdgpu_jpeg.c:42` initializes `jpeg_pg_lock`). - No API or architectural changes; just adds the missing lock/unlock in the idle gating path. - Workqueue context is safe for `mutex_lock`. The active path cancels the idle work (`cancel_delayed_work_sync`) before taking the same mutex, minimizing contention and avoiding deadlocks.
- Stable/backport criteria - Fixes a real race condition that can affect end users (sporadic JPEG decode malfunctions when idle gating collides with new submissions). - Minimal and low risk; aligns two code paths to the same locking policy. - No feature additions or broad refactoring. - Although the commit message lacks Fixes/Cc stable tags, it is a clear correctness fix in a driver subsystem and fits stable policy.
Conclusion: This is an important, low-risk race fix in the AMDGPU JPEG power management path. It should be backported to stable.
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c index 82d58ac7afb01..5d5e9ee83a5d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c @@ -121,10 +121,12 @@ static void amdgpu_jpeg_idle_work_handler(struct work_struct *work) fences += amdgpu_fence_count_emitted(&adev->jpeg.inst[i].ring_dec[j]); }
- if (!fences && !atomic_read(&adev->jpeg.total_submission_cnt)) + if (!fences && !atomic_read(&adev->jpeg.total_submission_cnt)) { + mutex_lock(&adev->jpeg.jpeg_pg_lock); amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_JPEG, AMD_PG_STATE_GATE); - else + mutex_unlock(&adev->jpeg.jpeg_pg_lock); + } else schedule_delayed_work(&adev->jpeg.idle_work, JPEG_IDLE_TIMEOUT); }