Re: [PATCH 3/3] drm/amdgpu: Increase soft recovery timeout to .5s

8 Mar 2024

It definitely takes much longer than 10-20ms in some instances.
Some of these instances can even be shown in Freidrich's hang test suite 
-- specifically when there are a lot of page faults going on.
The work (or parts of the work) could also be pending and not in any 
wave yet, just hanging out in the ring. There may be a better solution 
to that, but I don't know it.
Raising it to .5s still makes sense to me.
- Joshie 🐸✨
On 3/8/24 08:29, Christian König wrote:
...
Am 07.03.24 um 20:04 schrieb Joshua Ashton:
...
Results in much more reliable soft recovery on
Steam Deck.
Waiting 500ms for a locked up shader is way to long I think. We could 
increase the 10ms to something like 20ms, but I really wouldn't go much 
over that.
This here just kills shaders which are in an endless loop, when that 
takes longer than 10-20ms we really have a hardware problem which needs 
a full reset to resolve.
Regards,
Christian.
...
Signed-off-by: Joshua Ashton joshua@froggi.es
Cc: Friedrich Vock friedrich.vock@gmx.de
Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
Cc: Christian König christian.koenig@amd.com
Cc: André Almeida andrealmeid@igalia.com
Cc: stable@vger.kernel.org

drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 57c94901ed0a..be99db0e077e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -448,7 +448,7 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring 
*ring, unsigned int vmid,
      spin_unlock_irqrestore(fence->lock, flags);
      atomic_inc(&ring->adev->gpu_reset_counter);
-    deadline = ktime_add_us(ktime_get(), 10000);
+    deadline = ktime_add_ms(ktime_get(), 500);
      while (!dma_fence_is_signaled(fence) &&
             ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
          ring->funcs->soft_recovery(ring, vmid);

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 3/3] drm/amdgpu: Increase soft recovery timeout to .5s