If the IH ring buffer overflows, it's possible that fence signal events were lost. Check each ring for progress to prevent job timeouts/GPU hangs due to the fences staying unsignaled despite the work being done.
Cc: Joshua Ashton joshua@froggi.es Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: stable@vger.kernel.org
Signed-off-by: Friedrich Vock friedrich.vock@gmx.de --- v2: Set ih->overflow to false after processing fences
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index f3b0aaf3ebc6..4e061f7741d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c @@ -209,6 +209,7 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) { unsigned int count; u32 wptr; + int i;
if (!ih->enabled || adev->shutdown) return IRQ_NONE; @@ -227,6 +228,21 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) ih->rptr &= ih->ptr_mask; }
+ /* If the ring buffer overflowed, we might have lost some fence + * signal interrupts. Check if there was any activity so the signal + * doesn't get lost. + */ + if (ih->overflow) { + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->fence_drv.initialized) + continue; + amdgpu_fence_process(ring); + } + ih->overflow = false; + } + amdgpu_ih_set_rptr(adev, ih); wake_up_all(&ih->wait_process);
-- 2.43.0