From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit ec813f384b1a9df332e86ff46c422e5d2d00217f ]
We need to cancel any outstanding work at both suspend and driver teardown. Move the cancel to hw_fini which gets called in both cases.
Reviewed-by: David (Ming Qiang) Wu David.Wu3@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes: Prevents a race/UAF by ensuring the VPE idle delayed work is canceled during both suspend and full driver teardown. The idle work is initialized and used for power-gating idling (drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:355) and is scheduled in normal operation (drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:873) and can reschedule itself (drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:336). Without canceling it on teardown, it may run after the ring is stopped or while/after resources are being freed.
- Change details: - Added cancellation in teardown path: `cancel_delayed_work_sync(&adev->vpe.idle_work);` in `vpe_hw_fini()` so it runs on both suspend and teardown (drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:439). - Simplified suspend path to delegate to `vpe_hw_fini()` (drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:449–451), ensuring the same cancel happens on suspend. - The IP block ops confirm suspend and hw_fini are the hooks used by both flows (drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:978–981).
- Why it matters: The idle work handler toggles power gating (drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:333–336). If it remains queued while the ring is being stopped (drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:441) or while memory is being torn down in `sw_fini()`, it can touch freed objects or hardware in an invalid state. Moving the cancel to `hw_fini()` guarantees it runs before ring stop and power gating, and before subsequent teardown.
- Scope and risk: - Small, contained change in a single driver file; no API or architectural change. - Uses standard `cancel_delayed_work_sync`, which safely flushes and prevents requeue. - Consistent with existing practice elsewhere in amdgpu (e.g., cancellation when starting to use the ring: drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:849). - Low regression risk; only affects teardown/suspend sequencing of a driver-local delayed work.
- Stable backport fit: - Fixes a real race that can cause crashes during suspend/unload (user-visible reliability issue). - Minimal and targeted; no new features. - Touches a non-core subsystem (amdgpu VPE), keeping risk bounded.
Conclusion: This is a clear, low-risk bug fix that aligns with stable rules and should be backported.
drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c index 121ee17b522bd..dcdb2654ceb4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c @@ -435,6 +435,8 @@ static int vpe_hw_fini(struct amdgpu_ip_block *ip_block) struct amdgpu_device *adev = ip_block->adev; struct amdgpu_vpe *vpe = &adev->vpe;
+ cancel_delayed_work_sync(&adev->vpe.idle_work); + vpe_ring_stop(vpe);
/* Power off VPE */ @@ -445,10 +447,6 @@ static int vpe_hw_fini(struct amdgpu_ip_block *ip_block)
static int vpe_suspend(struct amdgpu_ip_block *ip_block) { - struct amdgpu_device *adev = ip_block->adev; - - cancel_delayed_work_sync(&adev->vpe.idle_work); - return vpe_hw_fini(ip_block); }